jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
http://ccb.jhu.edu/software/bracken/index.shtml
GNU General Public License v3.0
273 stars 50 forks source link

Bracken read length for ITS1 amplicons #260

Open ARW-UBT opened 2 months ago

ARW-UBT commented 2 months ago

Hi, I have analysed ITS1 amplicons from fungal samples (leaf endophytes) using kraken2 and the PlusPFP-16 indices provided by Ben Langmead. These amplicons were trimmed for PCR primer sequences, but they do exhibit a considerable length polymorphism from under 200 bp up to about 280 bp (300 bp Illumina reads minus at least one 5' primer).

I have then run bracken on the k2 reports, but did not specify any specific read length value and the default value (100) was used. Does this make sense, or would you propose to set -r to a higher value (e.g. the shortest fastq read in the data set?

Best,

jenniferlu717 commented 2 months ago

I haven't tested on read lengths higher than 150 but I know there is a big difference between using 50 vs 100. I would suggest using the read length to the highest value that is <= to the shortest read in your dataset. Based on your numbers, I think 200 would be best