ndreey / CONURA_WGS

Metagenomic analysis on whole genome sequencing data from Tephritis conura (IN PROGRESS)
0 stars 0 forks source link

Kraken2: Host-decontamination #28

Open ndreey opened 5 months ago

ndreey commented 5 months ago

Kraken2

Kraken2 uses a lowest common ancestor (LCA) strategy to determine the taxonomy of the read. It achieves this by splitting the database genomes into kmers and then mapping kmers of the read to the database. The lineage that had the majority of mapped kmers is then classified as the taxonomy for said read.

Building the kraken2 database

We will use a human reference genome and the T. conura masked reference genome to build a karaken2 database. Furthermore, we will also remove contigs < 50Kb in length.

To build it we will use: Specify database folder. DBNAME="path/to/database/folder"

Download NCBI taxonomy kraken2-build --download-taxonomy --db $DBNAME

Download human reference kraken2-build --download-library human --db $DBNAME

Now, to generate the custom database using the T.conura reference, kraken2 requires:

The