jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
http://ccb.jhu.edu/software/bracken/index.shtml
GNU General Public License v3.0
284 stars 50 forks source link

High number of non-distributed reads #130

Open HugoDENISFR opened 3 years ago

HugoDENISFR commented 3 years ago

Hello, I am trying to use Bracken on Kraken2 results generated with full SILVA database 138.1. I understood from previous threads that assignation at specie-level is not easy to implement using SILVA thus I tried to perform estimations at genus level.

It seems to work. However in each of my samples, a large fraction of the reads are not distributed, even though I didn't supply any treshold :

` >> Checking for Valid Options...

Running Bracken python src/est_abundance.py -i /travail/Temporaire01/2017IsisNZGL01916Q/Analyse_transcriptome/30rRNA/Kraken2_output/Report_Kraken2_LSU_rrna_sick30_3TMT1 -o /travail/Temporaire01/2017IsisNZGL01916Q/Analyse_transcriptome/30rRNA/Bracken_output/Bracken_LSU_3TMT1 -k LSU/database100mers.kmer_distrib -l G -t 0 PROGRAM START TIME: 11-25-2020 22:40:56 BRACKEN SUMMARY (Kraken report: /travail/Temporaire01/2017IsisNZGL01916Q/Analyse_transcriptome/30rRNA/Kraken2_output/Report_Kraken2_LSU_rrna_sick30_3TMT1)

Threshold: 0 Number of genuses in sample: 362 Number of genuses with reads > threshold: 362 Number of genuses with reads < threshold: 0 Total reads in sample: 334296 Total reads kept at genuses level (reads > threshold): 10117 Total reads discarded (genuses reads < threshold): 0 Reads distributed: 52306 Reads not distributed (eg. no genuses above threshold): 149396 Unclassified reads: 122477 BRACKEN OUTPUT PRODUCED: /travail/Temporaire01/2017IsisNZGL01916Q/Analyse_transcriptome/30rRNA/Bracken_output/Bracken_LSU_3TMT1 PROGRAM END TIME: 11-25-2020 22:40:56 Bracken complete.`

Reading the Bracken paper, I thought every reads, whatever their initial taxonomic rank obtained with Kraken2, would be re-assigned at the genus level based on the probability that they belong to each genus below their previous taxonomic rank.

How come this is not the case for most of them ? Can I do something about it ?

Many thanks

jenniferlu717 commented 3 years ago

I would have to take a look at the report. What I suspect is that there is no genus in the report that Bracken can move reads to.

For example. if there are 100 reads assigned to a Family taxon level, but no genus listed for that Family, Bracken cannot move the reads.

We do not distribute reads to genera NOT found by kraken to minimize false positives.

HugoDENISFR commented 3 years ago

Thank you, it makes sense then. Any idea on how to increase classification of reads to genus level with kraken ?