steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
771 stars 100 forks source link

Error: Convert Alignments died #329

Open pisle0 opened 4 weeks ago

pisle0 commented 4 weeks ago

I am trying to run a foldseek easy-search job with the UniProt database built from foldseek databases Alphafold/UniProt. With two different attempts on different machines (with up to 256GB memory), the job completes prefilter and structurealign, but immediately after the structurealign, it throws the following error trying allocate 415722228266 bytes of memory, and quits with Error: Convert Alignments died:

Can not touch 415722228266 into main memory
[=================================================================] 2.34K 2m 12s 776ms
Time for merging to strualn: 0h 0m 0s 9ms

I want to ask if this is purely a memory availability issue, and if there are ways to apply --split-memory-limit similar in the prefilter step. Thank you

milot-mirdita commented 4 weeks ago

Could you please post the whole log?

pisle0 commented 4 weeks ago

Sure, here is the whole log:

Create directory ./results/foldseek_0/tmp
easy-search ./db/db_0 ../UniProt ./results/foldseek_0/foldseek_0.m8 ./results/foldseek_0/tmp --format-mode 4 --format-output query,target,evalue,gapopen,pident,fident,nident,qstart,qend,qlen,tstart,tend,tlen,alnlen,bits,mismatch,qcov,tcov,qset,qsetid,tset,tsetid,lddt,qtmscore,ttmscore,alntmscore,prob -e 1e-5 --threads 95 

MMseqs Version:                 928984bfa3c7c3c98ca58b557d965965038f7e0b
Seq. id. threshold              0
Coverage threshold              0
Coverage mode                   0
Max reject                      2147483647
Max accept                      2147483647
Add backtrace                   false
TMscore threshold               0
TMalign hit order               0
TMalign fast                    1
Preload mode                    0
Threads                         95
Verbosity                       3
LDDT threshold                  0
Sort by structure bit score     1
Alignment type                  2
Exact TMscore                   0
Substitution matrix             aa:3di.out,nucl:3di.out
Alignment mode                  3
Alignment mode                  0
E-value threshold               1e-05
Min alignment length            0
Seq. id. mode                   0
Alternative alignments          0
Max sequence length             65535
Compositional bias              1
Compositional bias              1
Gap open cost                   aa:10,nucl:10
Gap extension cost              aa:1,nucl:1
Compressed                      0
Seed substitution matrix        aa:3di.out,nucl:3di.out
Sensitivity                     9.5
k-mer length                    6
Target search mode              0
k-score                         seq:2147483647,prof:2147483647
Max results per query           1000
Split database                  0
Split mode                      2
Split memory limit              0
Diagonal scoring                true
Exact k-mer matching            0
Mask residues                   0
Mask residues probability       0.99995
Mask lower case residues        1
Minimum diagonal score          30
Selected taxa                   
Spaced k-mers                   1
Spaced k-mer pattern            
Local temporary path            
Exhaustive search mode          false
Prefilter mode                  0
Search iterations               1
Remove temporary files          true
MPI runner                      
Force restart with latest tmp   false
Cluster search                  0
Path to ProstT5                 
Chain name mode                 0
Write mapping file              0
Mask b-factor threshold         0
Coord store mode                2
Write lookup file               1
Input format                    0
File Inclusion Regex            .*
File Exclusion Regex            ^$
Alignment format                4
Format alignment output         query,target,evalue,gapopen,pident,fident,nident,qstart,qend,qlen,tstart,tend,tlen,alnlen,bits,mismatch,qcov,tcov,qset,qsetid,tset,tsetid,lddt,qtmscore,ttmscore,alntmscore,prob
Database output                 false
Greedy best hits                false

Alignment backtraces will be computed, since they were requested by output format.
Create directory ./results/foldseek_0/tmp/6098992401104940622/search_tmp
search ./db/db_0 ../UniProt ./results/foldseek_0/tmp/6098992401104940622/result ./results/foldseek_0/tmp/6098992401104940622/search_tmp -a 1 --threads 95 --alignment-mode 3 -e 1e-05 --comp-bias-corr 1 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 -s 9.5 -k 6 --mask 0 --mask-prob 0.99995 --remove-tmp-files 1 

prefilter ./db/db_0_ss ../UniProt_ss ./results/foldseek_0/tmp/6098992401104940622/search_tmp/9988097761208953728/pref --sub-mat 'aa:3di.out,nucl:3di.out' --seed-sub-mat 'aa:3di.out,nucl:3di.out' -s 9.5 -k 6 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 1000 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 0.15 --diag-score 1 --exact-kmer-matching 0 --mask 0 --mask-prob 0.99995 --mask-lower-case 1 --min-ungapped-score 30 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 95 --compressed 0 -v 3 

Query database size: 2340 type: Aminoacid
Target split mode. Searching through 11 splits
Estimated memory consumption: 139G
Target database size: 214683829 type: Aminoacid
Process prefiltering step 1 of 11

Index table k-mer threshold: 78 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 19.56M 10s 743ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 19.56M 17s 198ms
Index statistics
Entries:          5380832714
DB size:          31277 MB
Avg k-mer size:   84.075511
Top 10 k-mers
    LVLVVV  7419030
    SVSVVV  6859272
    VVSVVV  5411307
    SVVVVV  5262829
    VVVSVS  3880468
    VSVVVV  3812658
    DDVVVV  3438600
    VLVLLV  3170092
    VVVVLV  3121737
    VSVSVV  2987097
Time for index table init: 0h 1m 3s 536ms
k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 1 of 11)
Query db start 1 to 2340
Target db start 1 to 19555737
[=================================================================] 2.34K 30m 3s 933ms

3322.179038 k-mers per position
2210082916 DB matches per sequence
2304 overflows
127 sequences passed prefiltering per query sequence
128 median result list length
4 sequences with 0 size result lists
Time for merging to pref_tmp_0: 0h 0m 0s 4ms
Time for merging to pref_tmp_0_tmp: 0h 0m 0s 7ms
Process prefiltering step 2 of 11

Index table k-mer threshold: 78 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 19.40M 9s 429ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 19.40M 16s 168ms
Index statistics
Entries:          5347628169
DB size:          31087 MB
Avg k-mer size:   83.556690
Top 10 k-mers
    LVLVVV  7362357
    SVSVVV  6816351
    VVSVVV  5405207
    SVVVVV  5255422
    LVVVVV  4885042
    VVVSVS  3864839
    VSVVVV  3826851
    DDVVVV  3480728
    VLVLLV  3157731
    VVVVLV  3134996
Time for index table init: 0h 1m 1s 398ms
k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 2 of 11)
Query db start 1 to 2340
Target db start 19555738 to 38959624
[=================================================================] 2.34K 29m 58s 797ms

3322.179038 k-mers per position
2211639591 DB matches per sequence
2305 overflows
127 sequences passed prefiltering per query sequence
128 median result list length
4 sequences with 0 size result lists
Time for merging to pref_tmp_1: 0h 0m 0s 2ms
Time for merging to pref_tmp_1_tmp: 0h 0m 0s 7ms
Process prefiltering step 3 of 11

Index table k-mer threshold: 78 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 19.42M 9s 340ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 19.42M 16s 703ms
Index statistics
Entries:          5345430013
DB size:          31075 MB
Avg k-mer size:   83.522344
Top 10 k-mers
    LVLVVV  7339400
    SVSVVV  6796668
    VVSVVV  5392562
    SVVVVV  5251138
    LVVVVV  4872815
    VVVSVS  3857049
    VSVVVV  3827082
    DDVVVV  3487809
    VLVLLV  3147702
    VVVVLV  3136874
Time for index table init: 0h 1m 1s 621ms
k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 3 of 11)
Query db start 1 to 2340
Target db start 38959625 to 58376168
[=================================================================] 2.34K 30m 9s 735ms

3322.179038 k-mers per position
2219785938 DB matches per sequence
2305 overflows
127 sequences passed prefiltering per query sequence
128 median result list length
4 sequences with 0 size result lists
Time for merging to pref_tmp_2: 0h 0m 0s 2ms
Time for merging to pref_tmp_2_tmp: 0h 0m 0s 7ms
Process prefiltering step 4 of 11

Index table k-mer threshold: 78 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 19.50M 9s 488ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 19.50M 15s 734ms
Index statistics
Entries:          5366001521
DB size:          31192 MB
Avg k-mer size:   83.843774
Top 10 k-mers
    LVLVVV  7387244
    SVSVVV  6833776
    VVSVVV  5404097
    SVVVVV  5247653
    VVVSVS  3877507
    VSVVVV  3809981
    DDVVVV  3453950
    VLVLLV  3162578
    VVVVLV  3123304
    VSVSVV  2977791
Time for index table init: 0h 1m 1s 74ms
k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 4 of 11)
Query db start 1 to 2340
Target db start 58376169 to 77880679
[=================================================================] 2.34K 30m 3s 901ms

3322.179038 k-mers per position
2213016830 DB matches per sequence
2304 overflows
127 sequences passed prefiltering per query sequence
128 median result list length
4 sequences with 0 size result lists
Time for merging to pref_tmp_3: 0h 0m 0s 4ms
Time for merging to pref_tmp_3_tmp: 0h 0m 0s 7ms
Process prefiltering step 5 of 11

Index table k-mer threshold: 78 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 19.60M 9s 389ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 19.60M 16s 352ms
Index statistics
Entries:          5367642730
DB size:          31202 MB
Avg k-mer size:   83.869418
Top 10 k-mers
    LVLVVV  7408638
    SVSVVV  6842541
    VVSVVV  5410304
    SVVVVV  5267791
    VVVSVS  3879765
    VSVVVV  3831500
    NVSVVV  3791072
    DDVVVV  3462511
    VLVLLV  3174025
    VVVVLV  3144011
Time for index table init: 0h 1m 1s 441ms
k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 5 of 11)
Query db start 1 to 2340
Target db start 77880680 to 97476863
[=================================================================] 2.34K 30m 10s 983ms

3322.179038 k-mers per position
2217079267 DB matches per sequence
2304 overflows
127 sequences passed prefiltering per query sequence
128 median result list length
4 sequences with 0 size result lists
Time for merging to pref_tmp_4: 0h 0m 0s 3ms
Time for merging to pref_tmp_4_tmp: 0h 0m 0s 7ms
Process prefiltering step 6 of 11

Index table k-mer threshold: 78 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 19.51M 9s 451ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 19.51M 15s 659ms
Index statistics
Entries:          5355201023
DB size:          31130 MB
Avg k-mer size:   83.675016
Top 10 k-mers
    LVLVVV  7372030
    SVSVVV  6811966
    VVSVVV  5399688
    SVVVVV  5256717
    VVVSVS  3866126
    VSVVVV  3826900
    VVVVVS  3499301
    DDVVVV  3498709
    VLVLLV  3164470
    VVVVLV  3139129
Time for index table init: 0h 1m 0s 656ms
k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 6 of 11)
Query db start 1 to 2340
Target db start 97476864 to 116986050
[=================================================================] 2.34K 30m 15s 483ms

3322.179038 k-mers per position
2214779072 DB matches per sequence
2305 overflows
127 sequences passed prefiltering per query sequence
128 median result list length
4 sequences with 0 size result lists
Time for merging to pref_tmp_5: 0h 0m 0s 3ms
Time for merging to pref_tmp_5_tmp: 0h 0m 0s 7ms
Process prefiltering step 7 of 11

Index table k-mer threshold: 78 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 19.59M 9s 397ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 19.59M 16s 434ms
Index statistics
Entries:          5366327648
DB size:          31194 MB
Avg k-mer size:   83.848870
Top 10 k-mers
    LVLVVV  7403264
    SVSVVV  6846618
    VVSVVV  5419233
    SVVVVV  5269344
    VVVSVS  3886329
    VSVVVV  3829408
    NVSVVV  3788360
    DDVVVV  3469383
    VLVLLV  3166475
    VVVVLV  3147738
Time for index table init: 0h 1m 1s 675ms
k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 7 of 11)
Query db start 1 to 2340
Target db start 116986051 to 136573459
[=================================================================] 2.34K 30m 3s 196ms

3322.179038 k-mers per position
2214895418 DB matches per sequence
2304 overflows
127 sequences passed prefiltering per query sequence
128 median result list length
4 sequences with 0 size result lists
Time for merging to pref_tmp_6: 0h 0m 0s 1ms
Time for merging to pref_tmp_6_tmp: 0h 0m 0s 6ms
Process prefiltering step 8 of 11

Index table k-mer threshold: 78 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 19.41M 9s 406ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 19.41M 15s 677ms
Index statistics
Entries:          5345802812
DB size:          31077 MB
Avg k-mer size:   83.528169
Top 10 k-mers
    LVLVVV  7333385
    SVSVVV  6795473
    VVSVVV  5378647
    SVVVVV  5240783
    LVVVVV  4866087
    VVVVPP  3861707
    VVVSVS  3841153
    VSVVVV  3821507
    DDVVVV  3491345
    VVVVLV  3128082
Time for index table init: 0h 1m 0s 588ms
k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 8 of 11)
Query db start 1 to 2340
Target db start 136573460 to 155981499
[=================================================================] 2.34K 30m 5s 448ms

3322.179038 k-mers per position
2220573568 DB matches per sequence
2305 overflows
127 sequences passed prefiltering per query sequence
128 median result list length
4 sequences with 0 size result lists
Time for merging to pref_tmp_7: 0h 0m 0s 2ms
Time for merging to pref_tmp_7_tmp: 0h 0m 0s 7ms
Process prefiltering step 9 of 11

Index table k-mer threshold: 78 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 19.48M 9s 670ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 19.48M 15s 581ms
Index statistics
Entries:          5367214664
DB size:          31199 MB
Avg k-mer size:   83.862729
Top 10 k-mers
    LVLVVV  7389205
    SVSVVV  6824085
    VVSVVV  5383258
    SVVVVV  5250820
    LVVVVV  4881895
    VVVSVS  3865754
    VSVVVV  3814827
    DDVVVV  3461080
    VLVLLV  3163401
    VVVVLV  3132693
Time for index table init: 0h 1m 0s 568ms
k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 9 of 11)
Query db start 1 to 2340
Target db start 155981500 to 175456597
[=================================================================] 2.34K 30m 17s 545ms

3322.179038 k-mers per position
2220547956 DB matches per sequence
2305 overflows
127 sequences passed prefiltering per query sequence
128 median result list length
4 sequences with 0 size result lists
Time for merging to pref_tmp_8: 0h 0m 0s 3ms
Time for merging to pref_tmp_8_tmp: 0h 0m 0s 7ms
Process prefiltering step 10 of 11

Index table k-mer threshold: 78 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 19.56M 9s 379ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 19.56M 16s 252ms
Index statistics
Entries:          5387033877
DB size:          31313 MB
Avg k-mer size:   84.172404
Top 10 k-mers
    LVLVVV  7420945
    SVSVVV  6850362
    VVSVVV  5409998
    SVVVVV  5242817
    VVVSVS  3880147
    VSVVVV  3802596
    DDVVVV  3422754
    VLVLLV  3161209
    VVVVLV  3120638
    VSVSVV  2978610
Time for index table init: 0h 1m 1s 181ms
k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 10 of 11)
Query db start 1 to 2340
Target db start 175456598 to 195012177
[=================================================================] 2.34K 30m 9s 562ms

3322.179038 k-mers per position
2212344872 DB matches per sequence
2304 overflows
127 sequences passed prefiltering per query sequence
128 median result list length
4 sequences with 0 size result lists
Time for merging to pref_tmp_9: 0h 0m 0s 4ms
Time for merging to pref_tmp_9_tmp: 0h 0m 0s 7ms
Process prefiltering step 11 of 11

Index table k-mer threshold: 78 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 19.67M 9s 495ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 19.67M 16s 393ms
Index statistics
Entries:          5368817694
DB size:          31208 MB
Avg k-mer size:   83.887776
Top 10 k-mers
    LVLVVV  7413874
    SVSVVV  6844630
    VVSVVV  5418602
    SVVVVV  5273216
    VVVSVS  3879932
    VSVVVV  3831910
    VVVVVS  3505573
    DDVVVV  3486660
    VLVLLV  3167080
    VVVVLV  3152161
Time for index table init: 0h 1m 1s 753ms
k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 11 of 11)
Query db start 1 to 2340
Target db start 195012178 to 214683829
[=================================================================] 2.34K 30m 7s 536ms

3322.179038 k-mers per position
2214850539 DB matches per sequence
2304 overflows
127 sequences passed prefiltering per query sequence
128 median result list length
4 sequences with 0 size result lists
Time for merging to pref_tmp_10: 0h 0m 0s 3ms
Time for merging to pref_tmp_10_tmp: 0h 0m 0s 6ms
Merging 11 target splits to pref
Preparing offsets for merging: 0h 0m 0s 1ms
[=================================================================] 2.34K 0s 69ms
Time for merging to pref: 0h 0m 0s 2ms
Time for merging target splits: 0h 0m 0s 125ms
Time for merging to pref_tmp: 0h 0m 0s 33ms
Time for processing: 6h 11m 7s 951ms
structurealign ./db/db_0 ../UniProt ./results/foldseek_0/tmp/6098992401104940622/search_tmp/9988097761208953728/pref ./results/foldseek_0/tmp/6098992401104940622/search_tmp/9988097761208953728/strualn --tmscore-threshold 0 --lddt-threshold 0 --sort-by-structure-bits 1 --alignment-type 2 --exact-tmscore 0 --sub-mat 'aa:3di.out,nucl:3di.out' -a 1 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 1e-05 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 0.5 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --zdrop 40 --threads 95 --compressed 0 -v 3 

Can not touch 415722228266 into main memory
[=================================================================] 2.34K 2m 12s 776ms
Time for merging to strualn: 0h 0m 0s 9ms
Time for processing: 0h 7m 4s 177ms
mvdb ./results/foldseek_0/tmp/6098992401104940622/search_tmp/9988097761208953728/strualn ./results/foldseek_0/tmp/6098992401104940622/search_tmp/9988097761208953728/aln 

Time for processing: 0h 0m 0s 1ms
mvdb ./results/foldseek_0/tmp/6098992401104940622/search_tmp/9988097761208953728/aln ./results/foldseek_0/tmp/6098992401104940622/result -v 3 

Time for processing: 0h 0m 0s 3ms
Removing temporary files
rmdb ./results/foldseek_0/tmp/6098992401104940622/search_tmp/9988097761208953728/pref -v 3 

Time for processing: 0h 0m 0s 4ms
Error: Convert Alignments died
milot-mirdita commented 4 weeks ago

Does it crash without --format-mode 4?

pisle0 commented 4 weeks ago

Hi, I tried the following:

  1. without --format-mode 4 but with --format-output kept, same error: Error: Convert Alignments died
  2. without both --format-mode 4 and --format-output, convertalis runs but the Can not touch 415722228266 into main memory still persists.
  3. with --format-mode 4 but without --format-output, same as 2

It looks like it must be one of the extra columns requested in the output causing the crash, do you have a rough idea of which one(s) may cause this issue?

pisle0 commented 3 weeks ago

Hi, I want to follow up to see if you have any insight to this? Could this be incompatibility between db built using previous versions of foldseek? Thank you in advance.

milot-mirdita commented 3 weeks ago

I have currently very limited time to look into this. My best guess right now is that one of these fields is causing a crash for some reason. I would guess that it might be one of the set ones, since these are less well tested:

qset,qsetid,tset,tsetid,lddt,qtmscore,ttmscore,alntmscore,prob

The databases should remain compatible between versions, so I don't think this is the issue.