Open taltman opened 4 years ago
I think the code does give counts down to that level if you specify S1 as the taxonomy level? Are you not seeing this option?
Hi @jenniferlu717 . I don't see the S1 option in the bracken
script nor in est_abundance.py
in the git repository that I cloned. Can you point me in the right direction?
Also, can you provide more description about the G1 and K7 non-traditional taxonomy level options that are alluded to in the README? Thank you!
Have you tried running with "S1" as the option? Bracken should take that option, although the default is "S"
The G1 taxonomy level is the first level inbetween species and genera (so above species, below genera). Sometimes, the taxonomy has an inbetween level for things such as "Mycobacterium tuberculosis complex".
K7 was included because sometimes what happens is that someone might have a custom taxonomy where they did not specify the traditional taxonomy levels, so Kraken basically assigns levels based on distance from root (the number designates distance from root).
Thanks for that tip. So it seems that Bracken accepts any integer suffix to the documented taxonomy level characters? I could specify 'S11235' with no issues...
My next question is whether there is a way to get the Bracken report to use the reference sequence accessions instead of the free-text description of the organism. I know this data is in the Bracken reference data, so there should be a way to extract it. Any advice?
Yes integer suffixes work.
Bracken uses only the kmer_distrib file and the report for the classification. Not everyone has the original databaseXmers.kraken file so those accessions may not be available.
You could go back and map the taxids to the accession by parsing the databaseXmers.kraken file if you have it still.
Looking at the Bracken source code, I see that counts do get down to the level of individual genomes. But this level of count data is not reported to the user.
Seems like it wouldn't be that much additional work to get Bracken to report down to the individual genome level, as other tools do. Could this feature be added?
Thanks!