Should the kmer length of bracken always have the same length of the used kraken2 DB ?

jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

http://ccb.jhu.edu/software/bracken/index.shtml

GNU General Public License v3.0

294 stars 50 forks source link

Should the kmer length of bracken always have the same length of the used kraken2 DB ? #248

Open paulzierep opened 9 months ago

paulzierep commented 9 months ago

Should the kmer length of bracken always have the same length of the used kraken2 DB ? When does it make sense to use a different length? We would like to provide the user of bracken in Galaxy with better information maybe you can help @jenniferlu717 ? https://github.com/galaxyproject/tools-iuc/issues/5745

jenniferlu717 commented 9 months ago

The Bracken length should be read length, not the kmer length. When we originally wrote Bracken, we did a few different tests with just the kmer length but found that using a read length was more accurate.

I wont go into too much detail but based on the bayesian formula, the probability of kmers classified at a taxon is not the same as probability of reads classified at a taxon.

When building, you should specify both kmer length (of the built database) and the read length

paulzierep commented 9 months ago

So to be sure, the kmer length should be identical of the corresponding kmer length of the kraken DB?

jenniferlu717 commented 8 months ago

The kmer length specified when building the database is the kmer length of the krakenDB (krakenUniq default = 31, kraken2 default = 35)