jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
http://ccb.jhu.edu/software/bracken/index.shtml
GNU General Public License v3.0
285 stars 50 forks source link

Bracken re-estimation of contrived human and bacterial samples #175

Open Fazizzz opened 2 years ago

Fazizzz commented 2 years ago

I was working with a sample made of 50% universal human reference and 50% a bacterial community reference. I used Bracken to get community composition and it removed millions of human reads and reallocated them to bacterial species. The sample should be around 50% human and the normal kraken report reflects this. I was expecting Bracken to provide a better estimate of sample composition and i see some improvement in bacterial representation based on the species in our reference but the reassignment of Homo Sapien completely in my samples is alarming. the read length was set to 150 and k-mer length was set to 35 when building the Bracken database. Kraken2 was ran beforehand without any thresholds or quality score requirements.

Any thoughts on why the Bracken species estimation is dumping millions of reads kraken2 assigned to humans?

jenniferlu717 commented 2 years ago

Bracken's estimations are based on prior estimations of what reads from the library species get assigned to. It could be that when classified, some sequences from bacteria are being assigned to human. So when Bracken is trying to adjust for that, it is moving the reads back.

What bacterial community reference genomes are you using?

Fazizzz commented 2 years ago

So this analysis was done using a Zymo community reference which has a predefined ratio of microbial species. Predominantly Bacteria and one or two fungi. We made a contrived sample consisting of 50% community reference and 50% human sample. The breakdown for our samples should have been around a 50/50 ration but almost all of the human reads were lost after running Bracken. We have custom database for kraken2 and I used bracken to generate a bracken db like the instructions as well. Interestingly, we did see higher read allocation to the community members from the reference we would expect but we stopped using bracken because it dumped almost all of the human reads . Here is the Krona profiles for the same sample before and after Bracken re-estimation.

Bracken.pptx