Open cdsnow opened 1 year ago
hi, I met the same problem as you. And I solved it at last. I think you need to check whether your cuda version is matched with this project. In this project, the torch version is 1.12.1 , it means that your cuda version must be one of [10.2 11.3 11.6]
find you cudatoolkit location, which nvcc. Make sure you are calling the cudatoolkit=11.3
Greetings!
Following the instructions, I've completed an installation and everything seemed to work including the generation of the MSA. Specifically, I've done the recommended conda installation, the pip installation of triton, and the local download/unpack of the datasets. Per my reading, the remainder of the instructions (e.g. Docker) seemed optional, so I jumped directly to trying inference.sh.
However, I'm hitting a repeatable Runtime CUDA error. Since the same error occurs when I try the benchmark run, I'll paste the output for that at the bottom. Keeping an eye on the VRAM, this does not seem to be an issue involving a lack of memory on the GPU (a RTX 3090) | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
(fastfold) csnow@icestorm:~/code/FastFold/benchmark$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0
Any advice!? Best wishes, -Chris
(fastfold) csnow@icestorm:~/code/FastFold/benchmark$ torchrun --nproc_per_node=1 perf.py --msa-length 128 --res-length 256 [08/25/23 10:33:06] INFO colossalai - colossalai - INFO: /home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/context/parallel_context.py:521 set_device
main()
File "perf.py", line 152, in main
layer_inputs = attn_layers[lyr_idx].forward(layer_inputs, node_mask, pair_mask)
File "/home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/fastfold-0.2.0-py3.8-linux-x86_64.egg/fastfold/model/fastnn/evoformer.py", line 65, in forward
m = self.msa(m, z, msa_mask)
File "/home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/fastfold-0.2.0-py3.8-linux-x86_64.egg/fastfold/model/fastnn/msa.py", line 143, in forward
node = self.MSARowAttentionWithPairBias(node, pair, node_mask_row)
File "/home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/fastfold-0.2.0-py3.8-linux-x86_64.egg/fastfold/model/fastnn/msa.py", line 63, in forward
b = F.linear(Z, self.linear_b_weights)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling
sys.exit(load_entry_point('torch==1.12.1', 'console_scripts', 'torchrun')())
File "/home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f( args, kwargs)
File "/home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
INFO colossalai - colossalai - INFO: process rank 0 is bound to device 0
[08/25/23 10:33:07] INFO colossalai - colossalai - INFO: /home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel seed is ParallelMode.DATA.
INFO colossalai - colossalai - INFO: /home/csnow/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/initialize.py:116 launch
INFO colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1, pipeline parallel size: 1, tensor parallel size: 1
Traceback (most recent call last): File "perf.py", line 187, in
cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 429806) of binary: /home/csnow/anaconda3/envs/fastfold/bin/python Traceback (most recent call last): File "/home/csnow/anaconda3/envs/fastfold/bin/torchrun", line 33, inperf.py FAILED
Failures: