Kraken2 uses a lowest common ancestor (LCA) strategy to determine the taxonomy of the read. It achieves this by splitting the database genomes into kmers and then mapping kmers of the read to the database. The lineage that had the majority of mapped kmers is then classified as the taxonomy for said read.
Building the kraken2 database
We will use a human reference genome and the T. conura masked reference genome to build a karaken2 database. Furthermore, we will also remove contigs < 50Kb in length.
To build it we will use:
Specify database folder.
DBNAME="path/to/database/folder"
Kraken2
Kraken2 uses a lowest common ancestor (LCA) strategy to determine the taxonomy of the read. It achieves this by splitting the database genomes into kmers and then mapping kmers of the read to the database. The lineage that had the majority of mapped kmers is then classified as the taxonomy for said read.
Building the kraken2 database
We will use a human reference genome and the T. conura masked reference genome to build a karaken2 database. Furthermore, we will also remove contigs < 50Kb in length.
To build it we will use: Specify database folder.
DBNAME="path/to/database/folder"
Download NCBI taxonomy
kraken2-build --download-taxonomy --db $DBNAME
Download human reference
kraken2-build --download-library human --db $DBNAME
Now, to generate the custom database using the T.conura reference, kraken2 requires:
kraken:taxid|xxx
where xxx should be replaces with the taxid.>sequence16|kraken:taxid|32630 Adapter sequence
The