Open sevengo8378 opened 5 months ago
Can you please pull the latest changes? I added something to print more informative error messages. That should help diagnose the issue.
What GPU does your system have?
Sure. My GPU is NVIDIA GeForce RTX 4090, Driver Version: 545.29.06 CUDA Version: 12.3, NVIDIA-SMI 545.29.06.
@milot-mirdita here's the updated log, it seems caused by out of gpu memory. Do u need the whole log file?
swissprot contains some very long sequences. I just tried out what the longest protein sequence is that you can predict with 24GB GPU RAM. Seems to be around ~9100AA. I guess that there is at least one sequence longer than that that causes the whole run to fail.
We should deal with this failure case better.
Thanks! I think 9100aa is enough. I can filter the sequence length first. In addition, I estimated that it would take about 100 days to build a database for more than 200M sequences using an nvidia 4090 card. Do you have any suggestions to increase the speed? For example, can using multiple cards in parallel?
Another idea is, can I split a database into multiple ones and createdb separately and then merge them together (so that when a chunk has a particularly long sequence, it can be removed and re-execution), is there any workflow I can refer to? At the same time, since the number of sequences exceeds 200M, will there be memory risks when merging?
I want to create a db based on Prostt5, and first use SwissProt to test it, and I encounter the error "Prediction failed". The execution command is as follows: ./foldseek createdb ./uniprot_sprot.fasta sp/db --prostt5-model foldseek_db/weights --gpu 1
Expected Behavior
The database is successfully generated in the sp/directory
Current Behavior
Steps to Reproduce (for bugs)
conda create -n foldseek-gpu -c conda-forge cmake cuda-nvcc libcurand-dev libcublas-dev cuda-nvrtc-dev cuda-version=12.4 conda activate foldseek-gpu cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_CUDA=1 -DCUDAToolkit_ROOT=$(dirname $(which nvcc))/../targets/x86_64-linux .. cd $work_dir ln -sf ./foldseek_github/build/src/foldseek ./foldseek ./foldseek createdb ./uniprot_sprot.fasta sp/db --prostt5-model foldseek_db/weights --gpu 1
Additional info 1: According to https://github.com/steineggerlab/foldseek/issues/285, I also compiled the debug version and debugged it as follows
Here is the result: error_log.txt
Additional info 2: If I pick the first 10,000 sequences of Swissprot as sp_1w.fasta as input, createdb is finally successful
Your Environment