Abundance percentages question

Dear authors,

I have been using Bracken to experiment on strain level classifications. I find the tool very useful. I do have a question regarding output. I test on reads from a gut microbiome standard (known input abundances), and use a testing Kraken2/Bracken database containing only the specific strains/genomes (but with full taxonomy of course) in the sample. For clarification, this input standard contains different species, and also 5 different e.coli strains. I noticed the following behaviour:

At species level the abundances are estimated the way I expected: name taxonomy_id taxonomy_lvl kraken_assigned_reads added_reads new_est_reads fraction_total_reads Saccharomyces cerevisiae 4932 S 7290 13 7303 0.01037 Fusobacterium nucleatum 851 S 17850 0 17850 0.02535 Faecalibacterium prausnitzii 853 S 98999 17 99016 0.14059 Escherichia coli 562 S 89813 265 90078 0.12790 Etc...

At S1 level, the following behaviour occurred: name taxonomy_id taxonomy_lvl kraken_assigned_reads added_reads new_est_reads fraction_total_reads Escherichia_coli_B3008 9999994 S1 6551 11507 18058 0.13867 Escherichia_coli_JM109 9999991 S1 2203 55610 57813 0.44396 Escherichia_coli_B1109 9999993 S1 3790 15849 19639 0.15082 Escherichia coli W 566546 S1 11122 7614 18736 0.14388 Escherichia_coli_b2207 9999992 S1 4539 11436 15975 0.12268

What I noticed, is that at S1 level, the 5 e.coli abundances add up to 100%, while at species level, E.coli made up only 12.79%. I expected the S1 abundances to add up to this 12.79%. I think this happened because in my database, I only listed the 5 different e.coli strains as "strain", while I listed the other genomes as "species".

However, does this mean that Bracken always adds up to 100% at a certain taxon level (G, S, S1)? Even when the taxon level above gives information about the abundance in the sample? Follow up question, can I combine the S1 and S abundance to calculate the "actual" abundance? Example: If E.coli makes up 12.79% of my sample, and "E.coli strain A" is found at 50%, then the abundance of "E.coli strain A" in the sample would be 6.4%.

jenniferlu717 / Bracken

Abundance percentages question #122