steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
693 stars 91 forks source link

(core dumped) "$MMSEQS" createcomplexreport "${QUERY}" "${TARGET}" "${SCORECOMPLEX_RESULT}" "${REPORT}" ${REPORT_PAR} #270

Open dabianzhixing opened 1 month ago

dabianzhixing commented 1 month ago

Hello, I'm using easy-complexsearch to find similar structure oligomers against a DB. However, I find some searches success but some searches failed. For the failed searches, the error is

"tmpFolder/12745254066866134041/easycomplexsearch.sh: line 52: 28200 Segmentation fault (core dumped) "$MMSEQS" createcomplexreport "${QUERY}" "${TARGET}" "${SCORECOMPLEX_RESULT}" "${REPORT}" ${REPORT_PAR}"

my command is like

"foldseek easy-complexsearch --alignment-type 1 7yw0_1.pdb chainDB 7yw0_1.txt tmpFolder --format-output "query,target"

I don't know what happened for the error. It seems like mmseq2 failed to filter the results?

martin-steinegger commented 1 month ago

@dabianzhixing what commit do you use? Whats in the chainDB?

milot-mirdita commented 1 month ago

Please try the latest release 9. there have been a lot of changes in preparation for the preprint, your issue might have been fixed already.

dabianzhixing commented 1 month ago

@martin-steinegger The foldseek version is 427df8a6b5d0ef78bee0f98cd3e6faaca18f172d. I downloaded it from the github last Friday. chainDB is a precomputed database. It has been used for monomer search in previous version and it worked well.

milot-mirdita commented 1 month ago

Could you please check again that you are actually running the binary for https://github.com/steineggerlab/foldseek/commit/427df8a6b5d0ef78bee0f98cd3e6faaca18f172d?

From your error message, it looks like you are using an older binary. easycomplexsearch.sh was renamed to easymultimersearch.sh. So the former string shouldn't appear in error messages anymore.

Also please upload the full terminal output of Foldseek.

Depending on when you created your chainDB, I would also recommend to try recreating it. If this was created before the Foldseek-MM work, it's .lookup file wouldn't have the correct format for FS-MM to work.

dabianzhixing commented 1 month ago

@milot-mirdita could you tell me how to use the release 9? I'm trying to reconstruct my DB.

milot-mirdita commented 1 month ago

You can download it here: https://github.com/steineggerlab/foldseek/releases/tag/9-427df8a

or from bioconda.

dabianzhixing commented 1 month ago

@milot-mirdita @martin-steinegger I think I'm using the right version.

foldseek Version: 427df8a6b5d0ef78bee0f98cd3e6faaca18f172d

But now, all the output files are empty. Thiere is no results.

The following is the output of the command line

foldseek easy-multimersearch /hdd_data/lvqy/rec/7yao_1.pdb /home/lvqy/foldseek/chainDB /hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt tmpFolder --format-output "query,target" /hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt exists and will be overwritten easy-multimersearch /hdd_data/lvqy/rec/7yao_1.pdb /home/lvqy/foldseek/chainDB /hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt tmpFolder --format-output query,target

MMseqs Version: GITDIR-NOTFOUND Chain name mode 0 Write mapping file 0 Mask b-factor threshold 0 Coord store mode 2 Write lookup file 1 Input format 0 File Inclusion Regex .* File Exclusion Regex ^$ Threads 192 Verbosity 3 Seq. id. threshold 0 Coverage threshold 0 Coverage mode 0 Max reject 2147483647 Max accept 2147483647 Add backtrace true TMscore threshold 0 TMalign hit order 0 TMalign fast 1 Preload mode 0 LDDT threshold 0 Sort by structure bit score 1 Alignment type 2 Exact TMscore 0 Substitution matrix aa:3di.out,nucl:3di.out Alignment mode 0 Alignment mode 0 E-value threshold 10 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Max sequence length 65535 Compositional bias 1 Compositional bias 1 Gap open cost aa:10,nucl:10 Gap extension cost aa:1,nucl:1 Compressed 0 Seed substitution matrix aa:3di.out,nucl:3di.out Sensitivity 4 k-mer length 0 Target search mode 0 k-score seq:2147483647,prof:2147483647 Max results per query 300 Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask residues probability 0.9 Mask lower case residues 1 Minimum diagonal score 30 Selected taxa
Spaced k-mers 1 Spaced k-mer pattern
Local temporary path
Exhaustive search mode false Prefilter mode 0 Search iterations 1 Remove temporary files false MPI runner
Force restart with latest tmp false Cluster search 0 Minimum assigned chains percentage Threshold 0 Multimer E-value 10000 Complex report mode 1 Alignment format 0 Format alignment output query,target Database output false

/hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt exists and will be overwritten convertalis tmpFolder/13859234540439683774/query /home/lvqy/foldseek/chainDB tmpFolder/13859234540439683774/multimer_result /hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt --sub-mat 'aa:3di.out,nucl:3di.out' --format-mode 0 --format-output query,target --translation-table 1 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --db-output 0 --db-load-mode 0 --search-type 0 --threads 192 --compressed 0 -v 3 --exact-tmscore 0

[=================================================================] 2 0s 0ms Time for merging to 7yao_1.txt: 0h 0m 0s 0ms Time for processing: 0h 0m 0s 108ms /hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt_report exists and will be overwritten createmultimerreport tmpFolder/13859234540439683774/query /home/lvqy/foldseek/chainDB tmpFolder/13859234540439683774/multimer_result /hdd_data/lvqy/oligomer/foldseek_result/7yao_1.txt_report --db-output 0 --threads 192 -v 3

[=================================================================] 1 0s 0ms Time for merging to 7yao_1.txt_report: 0h 0m 0s 0ms Time for processing: 0h 0m 0s 60ms

milot-mirdita commented 1 month ago

Please completely delete the tmpFolder and run again. The output you posted is incomplete since it's reusing results from the previous run.

dabianzhixing commented 1 month ago

@milot-mirdita I have tried several times. Delete all the files include the tmpFolder. But it doesn't work. I perform 500 queries and none of them have any results. I have also reconstruct the DB. I don't know what should I do.

milot-mirdita commented 1 month ago

Sorry, I meant for you to please rerun it with an empty temp folder so we would have an easier time to diagnose the issue. This was not meant to fix the issue.

Please rerun and post the terminal output here.

dabianzhixing commented 1 month ago

Create directory tmpFolder easy-multimersearch /hdd_data/lvqy/rec/7yao_1.pdb /home/lvqy/foldseek/chainDB 7yao_1.txt tmpFolder --format-output query,target

MMseqs Version: 427df8a6b5d0ef78bee0f98cd3e6faaca18f172d Chain name mode 0 Write mapping file 0 Mask b-factor threshold 0 Coord store mode 2 Write lookup file 1 Input format 0 File Inclusion Regex .* File Exclusion Regex ^$ Threads 192 Verbosity 3 Seq. id. threshold 0 Coverage threshold 0 Coverage mode 0 Max reject 2147483647 Max accept 2147483647 Add backtrace true TMscore threshold 0 TMalign hit order 0 TMalign fast 1 Preload mode 0 LDDT threshold 0 Sort by structure bit score 1 Alignment type 2 Exact TMscore 0 Substitution matrix aa:3di.out,nucl:3di.out Alignment mode 0 Alignment mode 0 E-value threshold 10 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Max sequence length 65535 Compositional bias 1 Compositional bias 1 Gap open cost aa:10,nucl:10 Gap extension cost aa:1,nucl:1 Compressed 0 Seed substitution matrix aa:3di.out,nucl:3di.out Sensitivity 4 k-mer length 0 Target search mode 0 k-score seq:2147483647,prof:2147483647 Max results per query 300 Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask residues probability 0.9 Mask lower case residues 1 Minimum diagonal score 30 Selected taxa Spaced k-mers 1 Spaced k-mer pattern Local temporary path Exhaustive search mode false Prefilter mode 0 Search iterations 1 Remove temporary files false MPI runner Force restart with latest tmp false Cluster search 0 Minimum assigned chains percentage Threshold 0 Multimer E-value 10000 Complex report mode 1 Alignment format 0 Format alignment output query,target Database output false

createdb /hdd_data/lvqy/rec/7yao_1.pdb tmpFolder/7613150203902551404/query --chain-name-mode 0 --write-mapping 0 --mask-bfactor-threshold 0 --coord-store-mode 2 --write-lookup 1 --input-format 0 --file-include '.*' --file-exclude '^$' --threads 192 -v 3

Output file: tmpFolder/7613150203902551404/query [=================================================================] 100.00% 1 eta - Time for merging to query_ss: 0h 0m 0s 6ms Time for merging to query_h: 0h 0m 0s 6ms Time for merging to query_ca: 0h 0m 0s 5ms Time for merging to query: 0h 0m 0s 5ms Ignore 2 out of 4. Too short: 2, incorrect: 0, not proteins: 0. Time for processing: 0h 0m 0s 115ms Create directory tmpFolder/7613150203902551404/multimersearch_tmp multimersearch tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimer_result tmpFolder/7613150203902551404/multimersearch_tmp -a 1

Create directory tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp search tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp -a 0

prefilter tmpFolder/7613150203902551404/query_ss /home/lvqy/foldseek/chainDB_ss tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/pref --sub-mat 'aa:3di.out,nucl:3di.out' --seed-sub-mat 'aa:3di.out,nucl:3di.out' -s 9.5 -k 0 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 1000 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 0.15 --diag-score 1 --exact-kmer-matching 0 --mask 0 --mask-prob 0.99995 --mask-lower-case 1 --min-ungapped-score 30 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 192 --compressed 0 -v 3

Query database size: 2 type: Aminoacid Estimated memory consumption: 8G Target database size: 730735 type: Aminoacid Index table k-mer threshold: 78 at k-mer size 6 Index table: counting k-mers [=================================================================] 100.00% 730.73K 0s 637ms Index table: Masked residues: 1640 Index table: fill [=================================================================] 100.00% 730.73K 0s 787ms Index statistics Entries: 177066430 DB size: 1501 MB Avg k-mer size: 2.766663 Top 10 k-mers LVLVVV 190029 VVLVVV 178847 SVSVVV 162380 VVSVVV 155087 SVVVVV 131915 VVNVVV 73457 DPVVVV 69750 CVVVVV 62709 LVSVVV 57314 VLVVVV 53947 Time for index table init: 0h 0m 3s 41ms Process prefiltering step 1 of 1

k-mer similarity threshold: 78 Starting prefiltering scores calculation (step 1 of 1) Query db start 1 to 2 Target db start 1 to 730735 [=================================================================] 100.00% 2 0s 3ms

15866.797600 k-mers per position 50433469 DB matches per sequence 2 overflows 1000 sequences passed prefiltering per query sequence 1000 median result list length 0 sequences with 0 size result lists Time for merging to pref: 0h 0m 0s 0ms Time for processing: 0h 0m 5s 443ms structurealign tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/pref tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/strualn --tmscore-threshold 0 --lddt-threshold 0 --sort-by-structure-bits 1 --alignment-type 2 --exact-tmscore 0 --sub-mat 'aa:3di.out,nucl:3di.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 10 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 0.5 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --zdrop 40 --threads 192 --compressed 0 -v 3

[=================================================================] 100.00% 2 8s 653ms Time for merging to strualn: 0h 0m 0s 9ms Time for processing: 0h 0m 14s 327ms mvdb tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/strualn tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/aln

Time for processing: 0h 0m 0s 7ms mvdb tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/aln tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result -v 3

Time for processing: 0h 0m 0s 5ms Removing temporary files rmdb tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/search_tmp/4495916864830139729/pref -v 3

Time for processing: 0h 0m 0s 0ms expandmultimer tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result_expand_pref --threads 192 -v 3

[=================================================================] 100.00% 1 eta - Time for merging to result_expand_pref: 0h 0m 0s 81ms Time for processing: 0h 0m 1s 309ms structurealign tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result_expand_pref tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result_expand_aligned --tmscore-threshold 0 --lddt-threshold 0 --sort-by-structure-bits 1 --alignment-type 2 --exact-tmscore 0 --sub-mat 'aa:3di.out,nucl:3di.out' -a 1 --alignment-mode 0 --alignment-output-mode 0 --wrapped-scoring 0 -e 10000 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --zdrop 40 --threads 192 --compressed 0 -v 3

[=================================================================] 100.00% 2 12s 909ms Time for merging to result_expand_aligned: 0h 0m 0s 5ms Time for processing: 0h 0m 17s 431ms scoremultimer tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimersearch_tmp/17679201808099428192/result_expand_aligned tmpFolder/7613150203902551404/multimer_result --min-assigned-chains-ratio 0 --threads 192 -v 3

[=================================================================] 100.00% 1 eta - Time for merging to multimer_result: 0h 0m 0s 55ms Time for processing: 0h 0m 2s 270ms convertalis tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimer_result 7yao_1.txt --sub-mat 'aa:3di.out,nucl:3di.out' --format-mode 0 --format-output query,target --translation-table 1 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --db-output 0 --db-load-mode 0 --search-type 0 --threads 192 --compressed 0 -v 3 --exact-tmscore 0

[=================================================================] 100.00% 2 0s 0ms Time for merging to 7yao_1.txt: 0h 0m 0s 0ms Time for processing: 0h 0m 0s 116ms createmultimerreport tmpFolder/7613150203902551404/query /home/lvqy/foldseek/chainDB tmpFolder/7613150203902551404/multimer_result 7yao_1.txt_report --db-output 0 --threads 192 -v 3

[=================================================================] 100.00% 1 eta - Time for merging to 7yao_1.txt_report: 0h 0m 0s 0ms Time for processing: 0h 0m 0s 56ms

I could see some results in the tmpFolder. But again. the output file is empty

dabianzhixing commented 1 month ago

@milot-mirdita The new output is shown. It seems that the merging time is strange.

Time for merging to 7yao_1.txt: 0h 0m 0s 0ms Time for merging to 7yao_1.txt_report: 0h 0m 0s 0ms

I have tried monomer search with easy-search. The results is correct.

milot-mirdita commented 1 month ago

Yeah something is weird.

We will have to take a look. Does the same also happen with our prebuilt databases?

Is this the same 7yao cif file as stored in the PDB?

dabianzhixing commented 1 month ago

I used my prebuild DB in this experiment. The 7yao file is same with PDB. In fact, all my queries return empty result.发自我的荣耀手机-------- 原始邮件 --------发件人: Milot Mirdita @.>日期: 2024年5月14日周二 晚上9:45收件人: steineggerlab/foldseek @.>抄送: dabianzhixing @.>, Mention @.>主 题: Re: [steineggerlab/foldseek] (core dumped) "$MMSEQS" createcomplexreport "${QUERY}" "${TARGET}" "${SCORECOMPLEX_RESULT}" "${REPORT}" ${REPORT_PAR} (Issue #270) Yeah something is weird. We will have to take a look. Does the same also happen with our prebuilt databases? Is this the same 7yao cif file as stored in the PDB?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

martin-steinegger commented 1 month ago

@Woosub-Kim could you have a look please?

milot-mirdita commented 1 month ago

Maybe one more thing for us to investigate:

Please post an excerpt of the chainDB.lookup:

head -n 50 chainDB.lookup

dabianzhixing commented 1 month ago

@milot-mirdita head -n 50 chainDB.lookup 0 101m_A 0 1 102l_A 1 2 102m_A 2 3 103l_A 3 4 103m_A 4 5 104l_A 5 6 104l_B 6 7 104m_A 7 8 105m_A 8 9 106m_A 9 10 107l_A 10 11 107m_A 11 12 108l_A 12 13 108m_A 13 14 109l_A 14 15 109m_A 15 16 10gs_A 16 17 10gs_B 17 18 10mh_C 18 19 110l_A 19 20 110m_A 20 21 111l_A 21 22 111m_A 22 23 112l_A 23 24 112m_A 24 25 113l_A 25 26 114l_A 26 27 115l_A 27 28 117e_A 28 29 117e_B 29 30 118l_A 30 31 119l_A 31 32 11as_A 32 33 11as_B 33 34 11ba_A 34 35 11ba_B 35 36 11bg_A 36 37 11bg_B 37 38 11gs_A 38 39 11gs_B 39 40 120l_A 40 41 121p_A 41 42 122l_A 42 43 123l_A 43 44 125l_A 44 45 126l_A 45 46 127l_A 46 47 128l_A 47 48 129l_A 48 49 12as_A 49

I use the prebuild DB for monomer retrieval and it works well.

dabianzhixing commented 1 month ago

problem solved. My prebuild DB--chainDB is originally constructed based on monomers. It could not be used for oligomer retrieval. I build a new oligomer DB. Now the result is correct. Thank you very much! @milot-mirdita @martin-steinegger