sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
1.96k stars 493 forks source link

unpackdb stalls #266

Open dominik-handler opened 2 years ago

dominik-handler commented 2 years ago

Expected Behavior

splitting of MSAs works multi-threaded might be related to #189

Current Behavior

on larger final.a3m files unpackdb stalls and does not continue when executed via colabfold_search multithreaded. manual execution single-threaded works nicely

Steps to Reproduce (for bugs)

Works not on my final.a3m

unpackdb /scratch-cbe/users/handler/alphafold_batch/2022-07-10_UAP+eIF-bait_vs_UAP+eIF-targets/MSAs.memoryTest.200/final.a3m /scratch-cbe/users/handler/alphafold_batch/2022-07-10_UAP+eIF-bait_vs_UAP+eIF-targets/MSAs.memoryTest.200 --unpack-name-mode 0 --unpack-suffix .a3m
MMseqs Version:         678c82ac44f1178bf9a3d49bfab9d7eed3f17fbc
Unpack name mode        0
Unpack suffix           .a3m
Threads                 176
Verbosity               3

[=%

works on the very same final.a3m

unpackdb /scratch-cbe/users/handler/alphafold_batch/2022-07-10_UAP+eIF-bait_vs_UAP+eIF-targets/MSAs.memoryTest.200/final.a3m /scratch-cbe/users/handler/alphafold_batch/2022-07-10_UAP+eIF-bait_vs_UAP+eIF-targets/MSAs.memoryTest.200 --unpack-name-mode 0 --unpack-suffix .a3m

MMseqs Version:         678c82ac44f1178bf9a3d49bfab9d7eed3f17fbc
Unpack name mode        0
Unpack suffix           .a3m
Threads                 1
Verbosity               3

[=================================================================] 100.00% 400 57s 152ms
Time for processing: 0h 0m 58s 195ms

ColabFold Output (for bugs)

there is no specific error message. unpackdb stalls

Context

When I execute colabfold_search on smaller input files (20) everything works nicely. On a larger set (200) I ran into this issue. The script produces some of the individual a3m files but stalls after a while. Also, this step does not use the threads I defined upon startup of colabfold_search but uses all available threads on this node. As I did not request all of them for my SLURM job it cannot utilize that many. Do you think this might be the problem?

Your Environment

Include as many relevant details about the environment you experienced the bug in. Mmseqs2 version 678c82ac44f1178bf9a3d49bfab9d7eed3f17fbc executed in a singularity container running ubuntu 20.04

Slurm job with 60 CPUs (120 threads) and 1000GB of memory.

Thank you, Dominik

suyufeng commented 1 year ago

Hi Dominik,

I also encountered this issue and found https://github.com/soedinglab/MMseqs2/issues/466 might be useful.

Best, Yufeng

dominik-handler commented 6 months ago

We found a fix for the problem. In our case multi-threading of the file-writes caused problems on the beegfs storage. Limiting unpackdb to 1 thread fixed the issue.