Open xuxiaochen1209 opened 2 years ago
Hi @xuxiaochen1209,
sorry for the late response. What you report sounds like a very big sequence. Can you check the input, especially the sequence size, and eventually provide us with that input so that we can check the input.
Please also have a look at this table. AlphaFold might take more than a day to predict the structure for a sequence of 3500 AAs.
Best, Roman
Hello,
I had a problem with running alphafold. The first two hours are very smooth, and I think the MSA part is finished in these two hours. However, when it showd:
I0905 13:06:56.466166 140453353674560 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (691, 691, 64)}, 'experimentally_resolved': {'logits': (691, 37)}, 'masked_msa': {'logits': (252, 691, 22)}, 'predicted_aligned_error': (691, 691), 'predicted_lddt': {'logits': (691, 50)}, 'structure_module': {'final_atom_mask': (691, 37), 'final_atom_positions': (691, 37, 3)}, 'plddt': (691,), 'aligned_confidence_probs': (691, 691, 64), 'max_predicted_aligned_error': (), 'ptm': (), 'iptm': (), 'ranking_confidence': ()} I0905 13:06:56.467109 140453353674560 run_alphafold.py:202] Total JAX model model_1_multimer_v2_pred_0 on VHVL predict time (includes compilation time, see --benchmark): 246.2s
This step takes forever. I checked the CPU usage, memory usage, and the GPU usage and they are:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 35488 dell 20 0 69.9g 4.8g 594148 R 100.0 3.8 1591:11 python /h+
Mem: 128357 6557 1730 106 120069 121081
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:3B:00.0 Off | N/A | | 30% 33C P2 101W / 320W | 5886MiB / 10240MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... Off | 00000000:5E:00.0 Off | N/A | | 30% 25C P0 88W / 320W | 0MiB / 10240MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 NVIDIA GeForce ... Off | 00000000:B1:00.0 Off | N/A | | 30% 25C P0 89W / 320W | 0MiB / 10240MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 NVIDIA GeForce ... Off | 00000000:D9:00.0 Off | N/A | | 30% 25C P0 94W / 320W | 0MiB / 10240MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 35488 C python 1020MiB | +-----------------------------------------------------------------------------+
The GPU memory is not very high since I saw some people's A100 had a menory usage with over 20000MiB. What's more, the GPU-Util is only 0-1%. I'm not sure whether it's because the graphic driver/CUDA/CUDNN/JAX versions are not matched (driver version: 515.43.04, CUDA version: 11.7, CUDNN version: 8.4.1.50, jaxlib version: 0.3.15+cuda11.cudnn82, python version: 3.8). I didn't see any error log, but it just didn't move on for over 30 hours. I also used 'conda activate alphafold' and tested in python3:
It seems that the CUDA and CUDNN works. So I'm confused and did anyone have this problem before and could you please kindly teach me how to solve it? Thanks a lot for your kind guide.