Open maAr517 opened 4 months ago
Im not 100% understanding what you are confused by.
Bracken is adding more reads to the Kraken read counts but when combining the reports to compare them, we care most about the est_reads (estimated reads after combining) not just the added reads.
Thank you for replying.
I totally agree with you on the point that estimated reads after combining should be cared. And I'm so sorry, I found my concerning is not a common problem to other users. I don't know why, but the script in my PC was modified.
I concerned the definition of estreads. In my bracken output file (the product of "-o" option), columns are "name", "taxonomy_id", "taxonomy_lvl", "kraken_assigned_reads", "added_reads", "new_est_reads", and "fraction_total_reads" from the first column to the last column. But, in my "combine_bracken_outputs.py" (lines 105-122),
#Process line
[name, taxid, taxlvl, kreads, estreads, areads, frac] = line.strip().split("\t")
estreads = int(estreads)
#Error Checks
if name not in sample_counts:
sample_counts[name] = {}
sample_counts[name][taxid] = {}
elif taxid != list(sample_counts[name].keys())[0]:
sys.exit("Taxonomy IDs not matching for species %s: (%s\t%s)" % (name, taxid, sample_counts[name].keys()[0]))
if len(level) == 0:
level = taxlvl
elif level != taxlvl:
sys.exit("Taxonomy level not matching between samples")
#Save counts
total_reads[curr_name] += estreads
sample_counts[name][taxid][curr_name] = estreads
#Close file
i_file.close()
So I thought that the term "aread" (I assumed "a" meant "all") was appropriate, but in https://github.com/jenniferlu717/Bracken/blob/master/analysis_scripts/combine_bracken_outputs.py, there is no problem.
#Process line
[name, taxid, taxlvl, kreads, areads, estreads, frac] = line.strip().split("\t")
estreads = int(estreads)
I'm sorry again for consuming your time with such a insignificant issue and thank you again for developing such a useful tool!
I found output data of this python script exhibited the number of "added_reads" in bracken output file as a read number, probably because "estreads" is used as a total count in line 119 and 120, but is it correct? It seems that "areads" is appropriate.
I am a beginner on bioinformatics, so I appreciate it if anyone made any suggestion or instruction.