Closed jfy133 closed 1 year ago
All addressed (i hope)!
this was already done with the specification of a 'profiler' rather than a classifer.
My point is that Bracken is a classifier and not a profiler. It still reports sequence abundance, not taxonomic abundance.
My reading/understanding of the section 'Classification versus abundance estimation' in https://peerj.com/articles/cs-104/
Is that is what it is doing?
Therefore, any assumption that Kraken’s raw read assignments can be directly translated into species- or strain-level abundance estimates (e.g., Schaeffer et al., 2015) is flawed, as ignoring reads at higher levels of the taxonomy will grossly underestimate some species, and creates the erroneous impression that Kraken’s assignments themselves were incorrect.
Nonetheless, metagenomics analysis often involves estimating the abundance of the species in a particular sample. Although we cannot unambiguously assign each read to a species, we would like to estimate how much of each species is present, specifically by estimating the number or percentage of reads in the sample.
<...>
Rather than re-engineer Kraken to address the ambiguous read classification issue and to provide abundance estimates directly, we decided to implement the new species-level abundance estimation method described here as a separate program
Unless I'm misunderstanding what they mean by 'species abundance ' (as it's never really defined...)?
Unless they mean that kraken2s sequence abundance is inaccurate so they restimate species level sequence abundance?
To compute species abundance, any genome-level (strain-level) reads are simply added together at the species level. In cases where only one genome from a given species is detected by Kraken in the dataset, we simply add the reads distributed downward from the genus level (and above) to the reads already assigned by Kraken to the species level. In cases where multiple genomes exist for a given species, the reads distributed to each genome are combined and added to the Kraken-assigned species level reads. The added reads give the final species-level abundance estimates.
Ugh terminology...
The added reads give the final species-level abundance estimates.
This is my understanding of Bracken. It is simply a redistribution of reads. If kraken2 had already assigned all reads at the species-level (hypothetically), then Bracken would make no further changes.
In my mind, the only tools I would call profilers are those using marker genes that give you a taxon abundance. All the other tools can only provide (protein) sequence abundance AFAIK and thus are classifiers.
Ok fair enough, I see the logic there. I will update.
I've removed the classifier/profiler column from table, and tweaked further the phrasing to hopefully make the distinction you made above. https://github.com/jfy133/taxprofiler-manuscript/commit/00f700f1dbf25c187140dbc183d42bfbe8e4c5ad
To be clear: I do not claim that my interpretation is the correct one. Hopefully, the changes that you made serve to avoid confusion. Cheers ☺️
Nope it actually makes sense when I read more into exactly what it is doing than just the general description :grimacing:
TODO I completely agree,I admitted I was trying to use the 'scale of the problem' as a 'shallow' reason to segue into the next paragraph, I'll try and restructure this paragraph to emphasise reason for the diversity of tools. That said I don't think the pipeline is solving any of those problems (other than maybe giving people more options for comparing) so I don't want to go into too much detail✅ Rephrase to note about that we want to infer real baundances but this is difficult
✅ done
✅ done
✅ done
✅ I think this is worth a separate discussion in person or on slack. We don't have to keep update versions if no-one is interested in the tool we have a fixed container so that shouldn't be problem, and I don't think the code around profilers will change much, so I don't see why we would need to remove stuff. I also find that most of the tools we've included are relatively 'static' and don't change much other than default databases. That said, we don't explicitly say we wouldn't remove stuff with this sentence ;) so I'll leave it as it is.
✅ this was already done with the specification of a 'profiler' rather than a classifer.
✅ it got lost, moved back to a relevant section and extended
✅ I don't feel it's necessary, that we filtered down implies there are others out there.
✅ Changed 'explicit' support, WSL is possible but we don't test for it. UGENE explicitly says it supports windows natively
✅ user customisation was maybe the wrong phrasing, basically it's sort of implied you could do it but the documentation doesn't explicitly show it. Rephrase to make that better.
✅ rephrased
✅ explciityl stated
✅ Started on slack