Open splaisan opened 5 years ago
How well does MMseqs2 work on 50k long ONT reads?
If this is not a use case for MMseqs2, any other suggestions?
@colinbrislawn I have tested linclust
with ONT reads. It should be possible to cluster them. However, we needed to tweak the parameters used for the banded alignment to account for the high error rate.
How do you want to use MMseqs2?
@colinbrislawn I have tested linclust with ONT reads. It should be possible to cluster them. However, we needed to tweak the parameters used for the banded alignment to account for the high error rate.
Awesome!
How do you want to use MMseqs2?
Existing Qiime 2 plugins offer several options for clustering and classifying short RNA sequences... but no plugins support clustering or classifying long, noisy sequences, or proteins of any kind.
I think an MMseqs2 plugin could bring a ton of functionality to Qiime 2. A method for taxonomic classification of ONT reads would help @splaisan and others.
We would be happy to assist members of the Qiime community with integrating MMseqs2. We felt it was a bit too much for us to tackle alone.
Sounds like a plan!
Building a plugin is a pretty big lift as it requires close integration with Qiime 2 semantic types. But at least the docs are good!
I don't think I'm the right person to lead development, but I would be happy to contribute methods to the plugin.
This more a feature request!
would someone have time and competence to create a python module similar to the one for vsearch (https://github.com/qiime2/q2-feature-classifier/tree/master/q2_feature_classifier) so that we can classify with multithreading in qiime2?
blast or vsearch runs typically take over 1 day and more for 50k long ONT reads which is really very long and I am dreaming of the speedup seen in the mmseqs2 paper
my current qiime2 execution looks like this in top but I have little knowledge of what it should translate to with mmseqs2; if I could have an equivalent, I may try to hack the vsearch wrapper code but my python skills are not that great.
qiime feature-classifier classify-consensus-vsearch --i-query rep-seqs.qza --i-reference-reads /data/biodata/MetONTIIME_DB/rrnDB_operons_sequence.qza --i-reference-taxonomy /data/biodata/MetONTIIME_DB/rrnDB_operons_taxonomy.qza --p-perc-identity 0.77 --p-query-cov 0.8 --p-maxaccepts 1 --p-strand both --p-min-consensus 0.51 --p-unassignable-label Unassigned --p-threads 24 --o-classification taxonomy.qza
Thanks for any help on this
PS:I do not dare to double post on the qiime2 page as this is often seen as offending by developers.