sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
1.95k stars 493 forks source link

colabfold_search failed on custom sequence database #609

Open qiyifei1 opened 6 months ago

qiyifei1 commented 6 months ago

Expected Behavior

produce an a3m file for a query sequence on a custom sequence database

Current Behavior

Got an error message

Invalid database read for database data file=CustomSeqDB/CustomSeq.idx, database index=CustomSeqDB/CustomSeq.idx.index
getData: local id (4294967295) >= db size (22)

Steps to Reproduce (for bugs)

#!/bin/sh
mmseqpath=~/local/source/mmseqs-15-6f452/bin/
mkdir CustomSeqDB && cd CustomSeqDB
$mmseqpath/mmseqs createdb ../custom.repseq.fasta  CustomSeq
$mmseqpath/mmseqs createindex CustomSeq tmp3 --remove-tmp-files 1
cd ../
colabfold_search --db1 CustomSeq --threads 1 --mmseqs $mmseqpath/mmseqs  --use-env 0 --use-templates 0 query.fasta  CustomSeqDB output

mmseqs-15-6f452 is the latest release from https://github.com/soedinglab/MMseqs2. I also compiled mmseqs 71dd32ec43e3ac4dabf111bbc4b124f1c66a85f1 as suggested in colabfold, but got the same error.

ColabFold Output (for bugs)

expandaln output/qdb CustomSeqDB/CustomSeq.idx output/res CustomSeqDB/CustomSeq.idx output/res_exp --db-load-mode 0 --threads 1 --expansion-mode 0 -e inf --expand-filter-clusters 1 --max-seq-id 0.95 

MMseqs Version:                 6f45232ac8daca14e354ae320a4359056ec524c2
Expansion mode                  0
Substitution matrix             aa:blosum62.out,nucl:nucleotide.out
Gap open cost                   aa:11,nucl:5
Gap extension cost              aa:1,nucl:2
Max sequence length             65535
Score bias                      0
Compositional bias              1
Compositional bias              1
E-value threshold               inf
Seq. id. threshold              0
Coverage threshold              0
Coverage mode                   0
Pseudo count mode               0
Pseudo count a                  substitution:1.100,context:1.400
Pseudo count b                  substitution:4.100,context:5.800
Expand filter clusters          1
Use filter only at N seqs       0
Maximum seq. id. threshold      0.95
Minimum seq. id.                0.0
Minimum score per column        -20
Minimum coverage                0
Select N most diverse seqs      1000
Preload mode                    0
Compressed                      0
Threads                         1
Verbosity                       3

Index version: 16
Generated by:  6f45232ac8daca14e354ae320a4359056ec524c2
ScoreMatrix:  VTML80.out
Index version: 16
Generated by:  6f45232ac8daca14e354ae320a4359056ec524c2
ScoreMatrix:  VTML80.out
Invalid database read for database data file=CustomSeqDB/CustomSeq.idx, database index=CustomSeqDB/CustomSeq.idx.index
getData: local id (4294967295) >= db size (22)
Traceback (most recent call last):
  File "/data/anaconda3/envs/alphafold/bin/colabfold_search", line 8, in <module>
    sys.exit(main())
  File "/data/anaconda3/envs/alphafold/lib/python3.8/site-packages/colabfold/mmseqs/search.py", line 318, in main
    mmseqs_search_monomer(
  File "/data/anaconda3/envs/alphafold/lib/python3.8/site-packages/colabfold/mmseqs/search.py", line 94, in mmseqs_search_monomer
    run_mmseqs(mmseqs, ["expandaln", base.joinpath("qdb"), dbbase.joinpath(f"{uniref_db}{dbSuffix1}"), base.joinpath("res"), dbbase.joinpath(f"{uniref_db}{dbSuffix2}"), base.joinpath("res_exp"), "--db-load-mode", str(db_load_mode), "--threads", str(threads)] + expand_param)
  File "/data/anaconda3/envs/alphafold/lib/python3.8/site-packages/colabfold/mmseqs/search.py", line 25, in run_mmseqs
    subprocess.check_call([mmseqs] + params)
  File "/data/anaconda3/envs/alphafold/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '[PosixPath('/data/local/source/mmseqs-15-6f452/bin/mmseqs'), 'expandaln', PosixPath('output/qdb'), PosixPath('CustomSeqDB/CustomSeq.idx'), PosixPath('output/res'), PosixPath('CustomSeqDB/CustomSeq.idx'), PosixPath('output/res_exp'), '--db-load-mode', '0', '--threads', '1', '--expansion-mode', '0', '-e', 'inf', '--expand-filter-clusters', '1', '--max-seq-id', '0.95']' returned non-zero exit status 1.

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

JinyuanSun commented 3 months ago

Have you solved this issue? I encountered the same issue.

yank666 commented 2 months ago

need help,me too