soedinglab / MMseqs2-App

MMseqs2 app to run on your workstation or servers
https://search.foldseek.com
GNU General Public License v3.0
55 stars 18 forks source link

mmseqs-server: errors when called from colabfold_batch #54

Closed pannoniac closed 1 year ago

pannoniac commented 2 years ago

Hi, I tried to set up an mmseq2 server following the steps in https://github.com/sokrypton/ColabFold/tree/main/MsaServer The server starts up as expected.

My colabfold commmand line: "colabfold_batch --host-url=http://server:port fastadir outputdir" (fastadir contains one fasta file) When running the command, there is some communication, but it exits with an error after a few seconds. The same command works on the "public" server, i.e. without the --host-url parameter.

I am not sure what I'm missing or what's wrong. The databases seem to work, as colabfold_search can process the same fasta files when pointed to the databases on the file system, i.e. "colabfold_search fastadir /path/to/databases outputdir2" works.

Commands used to index the databases (according to colabfold's instructions). Please see the content of my databases directory in attachment.

  mmseqs tsv2exprofiledb "uniref30_2103" "uniref30_2103_db"
  mmseqs createindex "uniref30_2103_db" tmp1 --remove-tmp-files 1
  mmseqs tsv2exprofiledb "colabfold_envdb_202108" "colabfold_envdb_202108_db"
  mmseqs createindex "colabfold_envdb_202108_db" tmp2 --remove-tmp-files 1

Could you please give me a hint what's wrong in my setup? It seems that some required files do not exist. Is this a known issue/general error? Many thanks.

mmseq-server's output:

$ ./msa-server -local -config config.ono
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 Webserver
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
2022/09/01 18:48:21 MMseqs2 worker
<internal_IP> - - [01/Sep/2022:18:49:18 +0200] "POST /ticket/msa HTTP/1.1" 200 67
createdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/job.fasta <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/qdb

Converting sequences
[
Time for merging to qdb_h: 0h 0m 0s 388ms
Time for merging to qdb: 0h 0m 0s 215ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 838ms
search <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/qdb <internal_path>/ColabFold/MsaServer/databases/uniref30_2103 <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/tmp --num-iterations 3 --db-load-mode 2 -a --k-score 'seq:96,prof:80' -e 0.1 --max-seqs 10000

Input <internal_path>/ColabFold/MsaServer/databases/uniref30_2103 does not exist
expandaln <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/qdb <internal_path>/ColabFold/MsaServer/databases/uniref30_2103.idx <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res <internal_path>/ColabFold/MsaServer/databases/uniref30_2103.idx <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_exp --db-load-mode 2 --expansion-mode 0 -e inf --expand-filter-clusters 1 --max-seq-id 0.95

Input <internal_path>/ColabFold/MsaServer/databases/uniref30_2103.idx does not exist
mvdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/tmp/latest/profile_1 <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/prof_res

Time for processing: 0h 0m 0s 0ms
lndb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/qdb_h <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/prof_res_h

Time for processing: 0h 0m 0s 6ms
align <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/prof_res <internal_path>/ColabFold/MsaServer/databases/uniref30_2103.idx <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_exp <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_exp_realign --db-load-mode 2 -e 10 --max-accept 100000 --alt-ali 10 -a

Input <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/prof_res does not exist
filterresult <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/qdb <internal_path>/ColabFold/MsaServer/databases/uniref30_2103.idx <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_exp_realign <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_exp_realign_filter --db-load-mode 2 --qid 0 --qsc 0.8 --diff 0 --max-seq-id 1.0 --filter-min-enable 100

Input <internal_path>/ColabFold/MsaServer/databases/uniref30_2103.idx does not exist
result2msa <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/qdb <internal_path>/ColabFold/MsaServer/databases/uniref30_2103.idx <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_exp_realign_filter <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/uniref.a3m --msa-format-mode 6 --db-load-mode 2 --filter-msa 1 --filter-min-enable 1000 --diff 3000 --qid 0.0,0.2,0.4,0.6,0.8,1.0 --qsc 0 --max-seq-id 0.95

Input <internal_path>/ColabFold/MsaServer/databases/uniref30_2103.idx does not exist
rmdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_exp_realign

Time for processing: 0h 0m 0s 2ms
rmdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_exp

Time for processing: 0h 0m 0s 1ms
rmdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res

Time for processing: 0h 0m 0s 1ms
rmdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_exp_realign_filter

Time for processing: 0h 0m 0s 1ms
search <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/prof_res <internal_path>/ColabFold/MsaServer/databases/pdb70 <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_pdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/tmp --db-load-mode 2 -s 7.5 -a -e 0.1

Input <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/prof_res does not exist
convertalis <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/prof_res <internal_path>/ColabFold/MsaServer/databases/pdb70.idx <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_pdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/pdb70.m8 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,cigar --db-load-mode 2

Input <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/prof_res does not exist
rmdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_pdb

Time for processing: 0h 0m 0s 1ms
search <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/prof_res <internal_path>/ColabFold/MsaServer/databases/colabfold_envdb_202108 <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_env <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/tmp --num-iterations 3 --db-load-mode 2 -a --k-score 'seq:96,prof:80' -e 0.1 --max-seqs 10000

Input <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/prof_res does not exist
expandaln <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/prof_res <internal_path>/ColabFold/MsaServer/databases/colabfold_envdb_202108.idx <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_env <internal_path>/ColabFold/MsaServer/databases/colabfold_envdb_202108.idx <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_env_exp -e inf --expansion-mode 0 --db-load-mode 2

Input <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/prof_res does not exist
align <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/tmp/latest/profile_1 <internal_path>/ColabFold/MsaServer/databases/colabfold_envdb_202108.idx <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_env_exp <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_env_exp_realign --db-load-mode 2 -e 10 --max-accept 100000 --alt-ali 10 -a

Input <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/tmp/latest/profile_1 does not exist
filterresult <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/qdb <internal_path>/ColabFold/MsaServer/databases/colabfold_envdb_202108.idx <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_env_exp_realign <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_env_exp_realign_filter --db-load-mode 2 --qid 0 --qsc 0.8 --diff 0 --max-seq-id 1.0 --filter-min-enable 100

Input <internal_path>/ColabFold/MsaServer/databases/colabfold_envdb_202108.idx does not exist
result2msa <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/qdb <internal_path>/ColabFold/MsaServer/databases/colabfold_envdb_202108.idx <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_env_exp_realign_filter <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/bfd.mgnify30.metaeuk30.smag30.a3m --msa-format-mode 6 --db-load-mode 2 --filter-msa 1 --filter-min-enable 1000 --diff 3000 --qid 0.0,0.2,0.4,0.6,0.8,1.0 --qsc 0 --max-seq-id 0.95

Input <internal_path>/ColabFold/MsaServer/databases/colabfold_envdb_202108.idx does not exist
rmdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_env_exp_realign_filter

Time for processing: 0h 0m 0s 3ms
rmdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_env_exp_realign

Time for processing: 0h 0m 0s 1ms
rmdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_env_exp

Time for processing: 0h 0m 0s 1ms
rmdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res_env

Time for processing: 0h 0m 0s 1ms
rmdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/qdb

Time for processing: 0h 0m 0s 5ms
rmdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/qdb_h

Time for processing: 0h 0m 0s 2ms
rmdb <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/res

Time for processing: 0h 0m 0s 0ms
2022/09/01 18:49:20 Execution Error: open <internal_path>/ColabFold/MsaServer/jobs/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA/uniref.a3m: no such file or directory
<internal_IP> - - [01/Sep/2022:18:49:23 +0200] "GET /ticket/X5WEFLF5vsH55VHTkuHOsdMZAmy79DZ-68NARA HTTP/1.1" 200 65

colabfold_batch's output:

2022-09-01 18:49:07,506 Running colabfold 1.3.0 (81e3fd0308de79566553cd6407fe8c67bdf68fbd)
<intgernal_path>/python/anaconda3/envs/colabfold/lib/python3.8/site-packages/haiku/_src/data_structures.py:37: FutureWarning: jax.tree_structure is deprecated, and will be removed in a future release. Use jax.tree_util.tree_structure instead.
  PyTreeDef = type(jax.tree_structure(None))
2022-09-01 18:49:14,929 Found 5 citations for tools or databases
2022-09-01 18:49:19,427 Query 1/1: 3gbn_B (length 173)
2022-09-01 18:49:19,452 Sleeping for 5s. Reason: PENDING                        
ERROR:   0%|                               | 0/150 [elapsed: 00:05 remaining: ?]
2022-09-01 18:49:24,462 Could not get MSA/templates for 3gbn_B: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.
Traceback (most recent call last):
  File "<internal_path>/python/anaconda3/envs/colabfold/lib/python3.8/site-packages/colabfold/batch.py", line 1332, in run
    ) = get_msa_and_templates(
  File "<internal_path>/python/anaconda3/envs/colabfold/lib/python3.8/site-packages/colabfold/batch.py", line 820, in get_msa_and_templates
    a3m_lines = run_mmseqs2(
  File "<internal_path>/python/anaconda3/envs/colabfold/lib/python3.8/site-packages/colabfold/colabfold.py", line 178, in run_mmseqs2
    raise Exception(f'MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.')

colabfold-databases.txt

milot-mirdita commented 1 year ago

Seems like the paths to the two databases are wrong (both the uniref and colabfold are missing the _db suffix). I've updated the scripts in the MsaServer subfolder to fix the issue.

pannoniac commented 1 year ago

I figured out the missing suffix by myself in the meantime, but thanks for fixing and for adding pdb and pdb70 databases as well and for the other changes that made colabfold communicate and work with msa-server! From my perspective this issue can be closed.