Open JosepFAbril opened 4 years ago
The most recent version (git master) of mmseqs2 can print out taxonomical information using the --format-output
options taxid,taxname,taxlineage
.
mmseqs search query target aln tmp
mmseqs convertalis query target aln aln.m8 --format-output query,target,taxid,taxname,taxlineage
If you want an taxonomical report than you can call LCA on the aln
result. But you might wanna filter the result before to not consider all remote hits in the lowest common ancestor computation.
mmseqs search query target aln tmp
# this call extracts the all highest scoring hits. It can be multiple hits per query
mmseqs filterdb aln aln_top --beats-first --filter-column 4 --comparison-operator le
# compute the lca of the best scoring hits
mmseqs lca target aln_top alnLca
Generally you have to run convertalis
and taxonomyreport
etc separately on each result.
However, you can bundle more queries into one run by giving more input fasta/q files to createdb
:
mmseqs createdb fasta1.fa fasta2.fa target
mmseqs search query target aln tmp
Now you can additionally give the qset
column to convertalis
to resolve from which input fasta file each search result came from.
mmseqs convertalis query target aln aln.m8 --format-output qset,query,target,etc...
You will get an output similar to this:
fasta1.fa q1 t5 ...
fasta1.fa q2 t7 ...
fasta2.fa q6 t1 ...
...
Btw, if you want a set of stickers (see https://twitter.com/thesteinegger/status/1201076220957315074), send me your address to milot at mirdita de.
I was looking for a command/option to merge the raw alignments or taxonomy files once they have been computed on different input sequence sets against the same database (and the use the convertalis or taxonomyreport commands on the merged output). Some of those alignments were already calculated, and I wondered if it was possible to avoid running again those into a merged input file on search/taxonomy commands. I really appreciate your suggestions and those by Martin and I will take into account for future searches with MMseq2.
Thanks again for your help... Josep F
However, you can bundle more queries into one run by giving more input fasta/q files to
createdb
:mmseqs createdb fasta1.fa fasta2.fa target mmseqs search query target aln tmp
Now you can additionally give the
qset
column toconvertalis
to resolve from which input fasta file each search result came from.mmseqs convertalis query target aln aln.m8 --format-output qset,query,target,etc...
You will get an output similar to this:
fasta1.fa q1 t5 ... fasta1.fa q2 t7 ... fasta2.fa q6 t1 ... ...
I will appreciate if you can help me with a couple of questions regarding MMseqs2. I've been running it using both approaches, for the alignment against sequence dbs (mmseq search -> convertalis) and for the taxonomical binning (mmseq taxonomy -> taxonomyreport), either with a single sequences set or multiple sets after de-multiplexing barcodes from sequencing run.
Thanks for your assistance on those questions... Josep F
Environment