sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
1.96k stars 494 forks source link

Indexdb died error message when creating colabfold_envdb_202108_db with MMseqs #121

Open gundalav opened 2 years ago

gundalav commented 2 years ago

Hi,

I was trying to setup the database. But it breaks upon the execution of this code:

mmseqs createindex colabfold_envdb_202108_db tmp2 --remove-tmp-files 1 The error message I get is this:

MMseqs Version:             edb8223d1ea07385ffe63d4f103af0eb12b2058e
Seed substitution matrix    aa:VTML80.out,nucl:nucleotide.out
k-mer length                0
Alphabet size               aa:21,nucl:5
Compositional bias          1
Max sequence length         65535
Max results per query       300
Mask residues               1
Mask lower case residues    0
Spaced k-mers               1
Spaced k-mer pattern
Sensitivity                 7.5
k-score                     seq:0,prof:0
Check compatible            0
Search type                 0
Split database              0
Split memory limit          0
Verbosity                   3
Threads                     8
Min codons in orf           30
Max codons in length        32734
Max orf gaps                2147483647
Contig start mode           2
Contig end mode             2
Orf start mode              1
Forward frames              1,2,3
Reverse frames              1,2,3
Translation table           1
Translate orf               0
Use all table starts        false
Offset of numeric ids       0
Create lookup               0
Compressed                  0
Add orf stop                false
Overlap between sequences   0
Sequence split mode         1
Header split mode           0
Strand selection            1
Remove temporary files      true

indexdb colabfold_envdb_202108_db colabfold_envdb_202108_db --seed-sub-mat aa:VTML80.out,nucl:nucleotide.out -k 0 --alph-size aa:21,nucl:5 --comp-bias-corr 1 --max-seq-len 65535 --max-seqs 300 --mask 1 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score seq:0,prof:0 --check-compatible 0 --search-type 0 --split 0 --split-memory-limit 0 -v 3 --threads 8

Target split mode. Searching through 34 splits
Estimated memory consumption: 29G
Write VERSION (0)
Write META (1)
Write SCOREMATRIX3MER (4)
Write SCOREMATRIX2MER (3)
Write SCOREMATRIXNAME (2)
Write SPACEDPATTERN (23)
Write GENERATOR (22)
Write DBR1INDEX (5)
Write DBR1DATA (6)
Write DBR2INDEX (7)
Killed
Error: indexdb died

It works fine with uniref30_2103.tar.gz file though.

How can I resolve the problem?

G.V.

martin-steinegger commented 2 years ago

I assume your computer does not have enough RAM. How much RAM does your server has?

gundalav commented 2 years ago

I am using AWS p3.2xlarge instance. It has around 61GB RAM.

Screen Shot 2021-12-08 at 16 18 50
martin-steinegger commented 2 years ago

Online searches: Our Colabfold server has ~760GB RAM and keeps full database and index in memory. Batch searches: To perform a batch search you require less memory. But its still approx 1 byte per residue. So I would assume you would probably require at least 90GB. We still need to figure out whats the lower bound for this database.

maxshen29 commented 2 years ago

i have 128G RAM, but i have same erro . the erro : Estimated memory consumption: 560G Process needs more than 38G main memory. Increase the size of --split or set it to 0 to automatically optimize target database split. Write VERSION (0) Write META (1) Write SCOREMATRIX3MER (4) Write SCOREMATRIX2MER (3) Write SCOREMATRIXNAME (2) Write SPACEDPATTERN (23) Write GENERATOR (22) Write DBR1INDEX (5) Write DBR1DATA (6) Write DBR2INDEX (7) Write DBR2DATA (8) Write HDR1INDEX (18) Write HDR1DATA (19) Write ALNINDEX (24) Write ALNDATA (25) Index table: counting k-mers [=================================================================] 100.00% 209.34M 7m 34s 698ms
Index table: Masked residues: 1117805658 Can not allocate entries memory in IndexTable::initMemory Error: indexdb died