Open rx32940 opened 4 years ago
Kraken2 standard library: default kmer length 35, minimizer length 31. NCBI taxonomy, bacterial, archaeal, and viral domains, human and a collection of known vectors (UniVec_Core) Kraken2 Custom library: bacteria, archaea, viral and rat reference.
run kraken2/bracken in phylum and genus level: dir for kraken2 output with standard db:
/scratch/rx32940/kraken/output/kraken_out
dir for kraken2 output with standard db after bracken estimation:
/scratch/rx32940/kraken/output/bracken_out
dir for kraken2 output with custom db:
/scratch/rx32940/kraken/output/custom_out
dir for kraken2 output with custom db after bracken estimation:
/scratch/rx32940/kraken/output/custom_bracken
code used to run kraken2 and bracken in phylum and genus levels
output in Dropbox:
$HOME/Dropbox/5. Rachel's projects/Metagenomic_Analysis/Kraken2-standard/standard/phylum(genus)/results
output on sapelo2:
/scratch/rx32940/kraken/output/bracken_out/genus(phylum)
Classfied (%) | Unclassfied (%) | |
---|---|---|
R22.K | 46.85 | 53.15 |
R22.L | 24.93 | 75.07 |
R22.S | 46.44 | 53.56 |
R26.K | 52.83 | 47.17 |
R26.L | 33.54 | 66.46 |
R26.S | 47.44 | 52.56 |
R27.K | 56.64 | 43.36 |
R27.L | 35.5 | 74.5 |
R27.S | 42.27 | 57.73 |
R28.K | 25.73 | 74.27 |
R28.L | 28.1 | 71.9 |
R28.S | 26.79 | 73.21 |
with the custom database, rattus replaced homo became the taxa with the highest reads abundance. the percentage of total reads been classified has also increased significantly.
output in Dropbox:
$HOME/Dropbox/5. Rachel's projects/Metagenomic_Analysis/Kraken2-standard/custom/phylum(genus)/results
output on sapelo2:
/scratch/rx32940/kraken/output/custom_bracken/genus(phylum)
Classfied (%) | Unclassfied (%) | |
---|---|---|
R22.K | 70.92 | 29.08 |
R22.L | 30.43 | 69.57 |
R22.S | 62.96 | 37.04 |
R26.K | 70.03 | 29.97 |
R26.L | 44.85 | 55.15 |
R26.S | 63.24 | 36.76 |
R27.K | 69.66 | 30.34 |
R27.L | 32.45 | 67.55 |
R27.S | 61.73 | 38.27 |
R28.K | 86.29 | 13.71 |
R28.L | 83.42 | 16.58 |
R28.S | 83.75 | 16.25 |
compare the improvement with using the minikraken2 library:
Sample ID | Classified | Unclassified |
---|---|---|
R22.K | 14.72% | 85.28% |
R22.L | 6.03% | 93.97% |
R22.S | 13.46% | 86.54% |
R26.K | 14.45% | 85.55% |
R26.L | 7.55% | 92.45% |
R26.S | 10.83% | 89.17% |
R27.K | 13.85% | 86.15% |
R27.L | 6.62% | 93.38% |
R27.S | 10.89% | 89.11% |
R28.K | 8.58% | 91.42% |
R28.L | 7.45% | 92.55% |
R28.S | 6.52% | 93.48% |
Relative Abundance of each genus identified in the metagenomic samples with custom Kraken2 library
Absolute abundance of each genus identified in the metagenomic samples with custom Kraken2 library
we can conclude from relative abundance plot that Lung samples has highest composition of microbes.
from the absolute abundance plot we are able to conclude that sample with better quality sequences tend to have more reads identified by Kraken2
[x] figure out why Shigella is missing in relative abundance _solved: ordered top 10 by read counts instead of proportion, because fraction rounds up_
[x] change relative to absolute in title solved
[x] check the quality of R28 sequences. why is R28 Lung and spleen have such low microbe composition. while the composition seem normal for kidney, however, mainly responsible by the composition of Leptospira. _solved:The effective reads for R28 samples are three times higher than that of R22, which is the second best with effect reads_
Relative and absolute abundance
code for barplots in the last two issues
The classification system for viruses for both KRAKEN2 and CLARK is very tricky.
Due to the differences in taxonomy level between viruses and bacteria, viruses can't be expected to classified under the target phylum, genus or species like bacteria does.
however, both software requires the classification levels indicated for analysis (Kraken2 didn't, but required for bracken for abundance estimation and more accurate results)
Both Softwares chose to have viruses classified with a Genus and under it, species
However, while Clark under the genus level classification classified a large percentage of reads to Pandoravirus, none of the top 10 abundant taxa classified by Kraken2 under genus or species level were viruses.
a large number of reads were classified as Pandoravirus with KRAKEN2, but only after many more bacteria taxa.
[ ] maybe try to find top 20 taxa with clark results too, then do comprehensive comparison between identified taxa
/Users/rx32940/Dropbox/5.Rachel-projects/Metagenomic_Analysis/Kraken2-standard/custom/genus/top_20_genus_for_each_sample.csv