Open Merritt-Brian opened 1 year ago
metaPhlan does not generated classified/unclassified fastq file as an optional output. Use either awk or bioconda (python script) to filter out only reads that align to the hits. You'll need to import the classifier outfile from metaPhlan to figure out what reads classified vs didn't
It would be great to have a classifier that's built for nanopore reads (or other long-read sequencing methods) like spumoni. Kraken2 can have issues with nanopore's higher error rates.
Hi Brian. I used to love Metaphlan2, but unfortunately don't find it too useful anymore. My recollection is that there was no reliable way to update the reference database and the developers did not have regular releases (hence SARS-CoV-2 may be missing). Metaphlan4 dropped viruses (but can still use the Metaphlan3 algorithm)... but overall just was looking too hard to maintain (even though I love the strategy they use and the high specificity).
On this topic, I've found Diamond2 to be really helpful when dealing with novel viruses.... though it's species-level assignments tend to be very noisy. https://github.com/bbuchfink/diamond
I agree on the approach with Diamond as an alternative more more novel discoveries. It seems like most nucleotide classifiers (be they metagenomics-based or standard aligners), have less performance to aa-based alignment. Some preliminary testing with a novel (~55% identity) to a viral genome showed a dramatic improvement in detection F1 scores. We will optimize and introduce it into the pipeline (likely as a mandatory step) next month @aretchless
Description of feature
Adding 2 classifier approaches
Centrifuge and metaPhlan2