soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.37k stars 190 forks source link

The ways to create a custom taxonomy report #491

Open julie-tooi opened 3 years ago

julie-tooi commented 3 years ago

Hello! (It's me again :D)

I want to create report with taxonomy and alignment information. I need a little bit more than in Kraken/Krona and found out nice convertalis module with all fields that I interested in. So, first I created taxonomy alignment database

mmseqs createdb EcyBK_4.fasta EcyBK_4_db

mmseqs taxonomy \
EcyBK_4_db \
/media/tertiary/database_nr/mmseqs_20210920/NR_tax_mmseqs \
EcyBK_4_mmseqs tmp -s 7.5 \
--exact-kmer-matching 1 --min-ungapped-score 30 --threads 6

And as next step I planned to use convertalis module

mmseqs convertalis /media/tertiary/database_nr/mmseqs_20210920/NR_tax_mmseqs \
\EcyBK_4_db \
EcyBK_4_mmseqs \
EcyBK_4_mmseqs_report.tab \
--format-output "query,target,pident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,taxid,taxname,taxlineage"

But catch this error:

MMseqs Version:         13.45111
Substitution matrix     nucl:nucleotide.out,aa:blosum62.out
Alignment format        0
Format alignment output query,target,pident,alnlen,mismatch,gapopen,qstart,qend,\ 
tstart,tend,evalue,bits,taxid,taxname,taxlineage
Translation table       1
Gap open cost           nucl:5,aa:11
Gap extension cost      nucl:2,aa:1
Database output         false
Preload mode            0
Search type             0
Threads                 12
Compressed              0
Verbosity               3

Input database "EcyBK_4_mmseqs" has the wrong type (Taxonomy)
Allowed input:
- Alignment

So if I understand right the idea of module - it is about new taxonomy annotation, not construct the report from the previous taxonomy annotation, right? Maybe some ways exists to extract this type of custom report from the already taxonomy assigned database?

Thank you!

AmaliT commented 1 year ago

Hi, Did you find a solution for this? I am after a similar output as I am keen for more taxnomy information and looking to filter the results based a on a taxID. However I am getting the same error as well. Any suggestions would be appreciated.

mmseqs convertalis T25_hifi_norm $NR_DB T25_hifi_norm_results test
convertalis T25_hifi_norm ncbi_nr T25_hifi_norm_results test

MMseqs Version:         13.45111
Substitution matrix     nucl:nucleotide.out,aa:blosum62.out
Alignment format        0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Translation table       1
Gap open cost           nucl:5,aa:11
Gap extension cost      nucl:2,aa:1
Database output         false
Preload mode            0
Search type             0
Threads                 4
Compressed              0
Verbosity               3

Input database "T25_hifi_norm_results" has the wrong type (Taxonomy)
Allowed input:
- Alignment