soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.36k stars 190 forks source link

convertalis "Can not open index file ...index" #305

Closed ekg closed 4 years ago

ekg commented 4 years ago

Expected Behavior

I expected to run an alignment with mmseqs search and then convert the alignment into a custom tabular format with mmseqs convertalis.

Current Behavior

Running mmseq convertails fails with Can not open index file seqDB.index

Steps to Reproduce (for bugs)

Download refseq representative microbial genomes:

curl https://www.ncbi.nlm.nih.gov/projects/r_gencoll/ftp_service/nph-gc-ftp-service.cgi/\?HistoryId\=NCID_1_163497961_130.14.18.97_5555_1588507183_4102358427_0MetA0_S_HStore\&QueryKey\=1\&ReleaseType\=RefSeq\&FileType\=GENOME_FASTA\&Flat\=true >genomes.tar
tar xf genomes.tar

Generate the index:

mmseqs createdb ncbi-genomes-2020-05-03/*fna.gz seqDB
mmseqs createindex --threads 48 --search-type 2 seqDB tmp

Align anything, like this holliday junction resolvase:

>1FZR_1|Chains A,B,C,D|ENDONUCLEASE I|Enterobacteria phage T7 (10760)
VGAFRSGLEDKVSKQLESKGIKFEYEEWKVPYVIPASNHTYTPDFLLPNGIFVKTKGLWESDDRKKHLLIREQHPELDIRIVFSSSRTKLYKGSPTSYGEFCEKHGIKFADKLIPAEWIKEPKKEVPFDRLKRKGGKK
mmseqs createdb query.fa queryDB
mmseqs search -s 7.5 --max-seqs 2147483647 -a 1 queryDB seqDB alignDB tmp

Now, we try to extract the alignment information:

mmseqs convertalis queryDB seqDB alignDB aln.tab
... # error
Can not open index file seqDB.index!

Context

Why am I missing the .index file? This is a rather large target sequence set, and so I see things are broken up into many smaller index files. But there is not one main .index.

Your Environment

This is a recent debian system (not sure which version) and I'm running a build of mmseqs that I set up in guix. https://github.com/ekg/guix-genomics/blob/master/mmseqs2.scm

ekg commented 4 years ago

I regenerated the index with mmseqs createdb ncbi-genomes-2020-05-03/*fna.gz seqDB and it seems to work fine. I'm somewhat perplexed as to how it ended up in this state.

milot-mirdita commented 4 years ago

Glad it's working. That was super weird.

Regarding your guix definition. You might also want to add dependencies to bzlib and wget. (Theoretically also awk but awk seems to demanded by the POSIX standard to always be present).

milot-mirdita commented 4 years ago

You could take the homebrew recipe as a reference: https://github.com/Homebrew/homebrew-core/blob/master/Formula/mmseqs2.rb