steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
693 stars 91 forks source link

"Prediction failed" when createdb --gpu 1 #288

Open sevengo8378 opened 2 weeks ago

sevengo8378 commented 2 weeks ago

I want to create a db based on Prostt5, and first use SwissProt to test it, and I encounter the error "Prediction failed". The execution command is as follows: ./foldseek createdb ./uniprot_sprot.fasta sp/db --prostt5-model foldseek_db/weights --gpu 1

Expected Behavior

The database is successfully generated in the sp/directory

Current Behavior

image

Steps to Reproduce (for bugs)

conda create -n foldseek-gpu -c conda-forge cmake cuda-nvcc libcurand-dev libcublas-dev cuda-nvrtc-dev cuda-version=12.4 conda activate foldseek-gpu cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_CUDA=1 -DCUDAToolkit_ROOT=$(dirname $(which nvcc))/../targets/x86_64-linux .. cd $work_dir ln -sf ./foldseek_github/build/src/foldseek ./foldseek ./foldseek createdb ./uniprot_sprot.fasta sp/db --prostt5-model foldseek_db/weights --gpu 1

Additional info 1: According to https://github.com/steineggerlab/foldseek/issues/285, I also compiled the debug version and debugged it as follows

gdb --args ./foldseek_debug createdb ./uniprot_sprot.fasta sp/db --prostt5-model foldseek_db/weights --gpu 1 
# wait for a prompt to appear
r
# wait for the crash
bt

Here is the result: error_log.txt

Additional info 2: If I pick the first 10,000 sequences of Swissprot as sp_1w.fasta as input, createdb is finally successful

Your Environment

milot-mirdita commented 2 weeks ago

Can you please pull the latest changes? I added something to print more informative error messages. That should help diagnose the issue.

What GPU does your system have?

sevengo8378 commented 2 weeks ago

Sure. My GPU is NVIDIA GeForce RTX 4090, Driver Version: 545.29.06 CUDA Version: 12.3, NVIDIA-SMI 545.29.06.

sevengo8378 commented 2 weeks ago
image

@milot-mirdita here's the updated log, it seems caused by out of gpu memory. Do u need the whole log file?

milot-mirdita commented 2 weeks ago

swissprot contains some very long sequences. I just tried out what the longest protein sequence is that you can predict with 24GB GPU RAM. Seems to be around ~9100AA. I guess that there is at least one sequence longer than that that causes the whole run to fail.

We should deal with this failure case better.

sevengo8378 commented 2 weeks ago

Thanks! I think 9100aa is enough. I can filter the sequence length first. In addition, I estimated that it would take about 100 days to build a database for more than 200M sequences using an nvidia 4090 card. Do you have any suggestions to increase the speed? For example, can using multiple cards in parallel?

sevengo8378 commented 2 weeks ago

Another idea is, can I split a database into multiple ones and createdb separately and then merge them together (so that when a chunk has a particularly long sequence, it can be removed and re-execution), is there any workflow I can refer to? At the same time, since the number of sequences exceeds 200M, will there be memory risks when merging?