jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
http://ccb.jhu.edu/software/bracken/index.shtml
GNU General Public License v3.0
273 stars 50 forks source link

Question about calculations and intersample comparison #259

Open alanxelena opened 2 months ago

alanxelena commented 2 months ago

Hey Jennifer!

First of all, thanks for the tool and the support you provide to users. I have a question regarding how things are calculated in Bracken. I understand that the abundance of taxa can be only assessed after Bracken has been applied to Kraken, in this case, if I don't misunderstand your paper (https://doi.org/10.7717/peerj-cs.104) the reads assigned to each taxon are recalculated based on the number of K-mers assigned and the genome length. My question is the following: Is the column "New_est_reads" showing the number of reads assigned to that specific taxon after the Bayesian recalculation? or is there also some sort of normalisation considering the total number of reads in the sample? My last question comes from the fact that I would like to compare the abundance of certain taxa in two different samples with different total amount of reads (although not very different) and I thought it would be sensible to divide the assigned Bracken reads by the total amount of reads in each sample. Thanks as usual for your support!

jenniferlu717 commented 2 months ago

New_Est_reads is the number of reads assigned to the specific taxon after the Bayesian recalculation. There is no normalization done based on number of total reads.

For comparing two different samples, yes I would first normalized based on total number of reads in each sample. Please note though that if a threshold is included or if reads are not reassigned, the number of total reads in the bracken output file may be less than the number of reads reported in the kraken output file.