steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
780 stars 99 forks source link

Lookup file missing #162

Closed 2mal16 closed 1 year ago

2mal16 commented 1 year ago

Expected Behavior

The command foldseek easy-search data/${PDB_ID}.pdb dbs/alphafold_proteome result.m8 tmp_search should search a predownloaded database Alphafold/Proteome.

Current Behavior

afp_ca.lookup is missing and the createdb command therefore fails.

easy-search data/test_out/5MA6_structure_A.pdb dbs/alphafold_proteome pdb_result.m8 tmp_search --format-output query,target,alntmscore,u,t 

MMseqs Version:                 f0de872f3ab84bbd5c173424a6633f0384f3adbd
Seq. id. threshold              0
Coverage threshold              0
Coverage mode                   0
Max reject                      2147483647
Max accept                      2147483647
Add backtrace                   false
TMscore threshold               0
TMalign hit order               0
TMalign fast                    1
Preload mode                    0
Threads                         20
Verbosity                       3
LDDT threshold                  0
Sort by structure bit score     1
Alignment type                  2
Substitution matrix             aa:3di.out,nucl:3di.out
Alignment mode                  3
Alignment mode                  0
E-value threshold               10
Min alignment length            0
Seq. id. mode                   0
Alternative alignments          0
Max sequence length             65535
Compositional bias              1
Compositional bias              1
Gap open cost                   aa:10,nucl:10
Gap extension cost              aa:1,nucl:1
Compressed                      0
Seed substitution matrix        aa:3di.out,nucl:3di.out
Sensitivity                     9.5
k-mer length                    6
k-score                         seq:2147483647,prof:2147483647
Max results per query           1000
Split database                  0
Split mode                      2
Split memory limit              0
Diagonal scoring                true
Exact k-mer matching            0
Mask residues                   0
Mask residues probability       0.99995
Mask lower case residues        1
Minimum diagonal score          30
Selected taxa                   
Spaced k-mers                   1
Spaced k-mer pattern            
Local temporary path            
Exhaustive search mode          false
Prefilter mode                  0
Search iterations               1
Remove temporary files          true
MPI runner                      
Force restart with latest tmp   false
Cluster search                  0
Chain name mode                 0
Write mapping file              0
Mask b-factor threshold         0
Coord store mode                2
Write lookup file               1
File Inclusion Regex            .*
File Exclusion Regex            ^$
Alignment format                0
Format alignment output         query,target,alntmscore,u,t
Database output                 false
Greedy best hits                false

Alignment backtraces will be computed, since they were requested by output format.
createdb data/test_out/5MA6_structure_A.pdb tmp_search/4291256671650737152/query --chain-name-mode 0 --write-mapping 0 --mask-bfactor-threshold 0 --coord-store-mode 2 --write-lookup 1 --file-include '.*' --file-exclude '^$' --threads 20 -v 3 

Output file: tmp_search/4291256671650737152/query
[=================================================================] 100.00% 1 eta -
Time for merging to query_ss: 0h 0m 0s 0ms
Time for merging to query_h: 0h 0m 0s 0ms
Time for merging to query_ca: 0h 0m 0s 0ms
Time for merging to query: 0h 0m 0s 0ms
Ignore 0 out of 1.
Too short: 0, incorrect: 0, not proteins: 0.
Time for processing: 0h 0m 0s 7ms
createdb dbs/alphafold_proteome tmp_search/4291256671650737152/target --chain-name-mode 0 --write-mapping 0 --mask-bfactor-threshold 0 --coord-store-mode 2 --write-lookup 1 --file-include '.*' --file-exclude '^$' --threads 20 -v 3 

Output file: tmp_search/4291256671650737152/target
[=================================================================] 100.00% 564.45K 0s 152ms    
Cannot open lookup file dbs/alphafold_proteome/afp_ca.lookup!
Error: target createdb died

Steps to Reproduce (for bugs)

  1. Download database

    foldseek databases Alphafold/Proteome dbs/alphafold_proteome/afp tmp_download
  2. Execute the command in Current Behavior.

Foldssek Output (for bugs)

See Current Behavior.

Context

Want a fast similarity search against the existing dbs for existing pdb structure files.

Your Environment

Downloaded foldseek-linux-avx2.tar.gz as statically compiled version for Linux today. Unpacked the archive and added foldseek binary to PATH.

Evaluating the Linux system for its capabilities returns

64bit: Yes
AVX2: Yes
SSE4.1: Yes
SSE2: Yes

Operating system and version Ubuntu 23.04 using the 6.2.0-24-generic Kernel.

milot-mirdita commented 1 year ago

You need to provide the same output path you used during the databases call to the easy-search call as input:

foldseek easy-search data/${PDB_ID}.pdb dbs/alphafold_proteome/afp result.m8 tmp_search
2mal16 commented 1 year ago

@milot-mirdita Thanks for catching that, it worked!