Closed KunFang93 closed 4 months ago
Hi, I think this is related to the CUDA installation. What GPU do you have and what is the result of 'nvidia-smi'?
Best,
Patrick
Hi,
Thanks for your reply! We have A100 and here is the Nvidia-smi info
(speed_ppi) [kfang@compgeno ~]$ nvidia-smi --query-gpu=name --format=csv,noheader
NVIDIA A100 80GB PCIe
NVIDIA A100 80GB PCIe
(speed_ppi) [kfang@compgeno ~]$ nvidia-smi
Sat Feb 24 09:50:48 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100 80G... On | 00000000:22:00.0 Off | 0 |
| N/A 41C P0 47W / 300W | 0MiB / 81920MiB | 0% E. Process |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100 80G... On | 00000000:26:00.0 Off | 0 |
| N/A 39C P0 45W / 300W | 0MiB / 81920MiB | 0% E. Process |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
However, when I use nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0
It looks like nvcc show it is 12.3 but Nvidia-smi show the version is 12.0. I am a fresh user for using GPU. so I wondered if this could be the potential problem? And what should I do to fix it...
Thank you so much for helping!
Best, Kun
I just updated cuda driver's version to 12.4 and reinstall speed_ppi conda environment but still fail with same error
(speed_ppi) [kfang@compgeno SpeedPPI]$ nvidia-smi
Sat Feb 24 10:47:11 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:22:00.0 Off | 0 |
| N/A 42C P0 48W / 300W | 0MiB / 81920MiB | 0% E. Process |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100 80GB PCIe On | 00000000:26:00.0 Off | 0 |
| N/A 41C P0 45W / 300W | 0MiB / 81920MiB | 0% E. Process |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Thanks in advance!
Best, Kun
I found a workaround way to solve this issue! I used
conda install jaxlib=*=*cuda* jax cuda-nvcc -c conda-forge -c nvidia
instead of
pip install --upgrade "jax[cuda12_local]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
The prediction run smoothly so far. I will report if anything goes wrong. Thanks!
The above solution successfully generate pdb file. Thanks!
Hi,
Thanks for providing this wonderful tool! I folllowed the instructions and everything run smoothly until the predicting step:
I tested
I have searched online and found this thread mentioned this error. I tried same codes in the thread and got same error.
I wondered if you could shed some light on how to solve this issue? Please let me know if any extra information needed. Thanks in advance!
Best, Kun