Closed crownk1997 closed 5 years ago
RTX 2080Ti needs CUDA10 version of PyTorch to be installed, not CUDA9. That's likely the reason for the error.
RTX 2080Ti needs CUDA10 version of PyTorch to be installed, not CUDA9. That's likely the reason for the error.
Actually cuda9.2 works, I uninstall cuda9.1 pytorch and then reinstalled the cuda9.2 pytorch0.4.1, then the messages disappeared.
My RTX2080ti reports an error while running CUDA10.0: RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:116
But the code runs on 1080ti and cuda10. Is there a similar problem with your use?
I'm also seeing the same error, and also have a 2080 TI.
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch-nightly_1549566624064/work/aten/src/THC/THCBlas.cu:259
I'm certainly not ruling out a bug in my code. Interestingly, I was able to get many models to train, but now this error keeps coming up. Any insight would be very welcome!
Ah, found it! I was running PyTorch with https://github.com/facebookresearch/maskrcnn-benchmark , and had to recompile the maskrcnn-benchmark library after installing.
From an issue on there, it seems that a particular version of pytorch-nightly had this issue, but the main releases of PyTorch don't. So once I updated to one of the main releases (PyTorch version: 1.0.1.post2), and rebuilt the library I was using, the issue went away.
well, in my case, pytorch=1.1.0, python=3.5.4, gcc=5.4, 2080ti, and cuda=10.0
, and the problem is caused by the use of torch.nn.utils.spectral_norm
in the discriminator. Now i can just remove the normalization.
我也看到了相同的错误,并且还有2080 TI。
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch-nightly_1549566624064/work/aten/src/THC/THCBlas.cu:259
我当然不会排除代码中的错误。有趣的是,我能够训练很多模型,但是现在这个错误不断出现。任何见识将非常欢迎!
2070也出现这种错误了,环境是torch1.01,Python3.6怎么解决呢,谢谢
RTX 2080Ti needs CUDA10 version of PyTorch to be installed, not CUDA9. That's likely the reason for the error.
Actually cuda9.2 works, I uninstall cuda9.1 pytorch and then reinstalled the cuda9.2 pytorch0.4.1, then the messages disappeared.
Your solution worked perfectly and saved me a lot of time. Really appreciate it! my 2080ti works well with torch-0.4.1, cuda 9.2, and cudnn 7.6.5.
🐛 Bug
Recently, I try to run my previous programming with our new multiple GPUs servers with RTX 2080 Ti. I do not make any about my code which can run successfully on Cuda 9.0 with Tesla V100. I am not sure what the problem is and it seems that there is a problem with Cuda support. I have tried to use CUDA_LAUNCH_BLOCKING=1, but this cannot solve the problem.
The error is as the following.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument Traceback (most recent call last): File “main.py”, line 166, in pimg.copy(netG(p_z).detach()) File “/usr/local/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 477, in call result = self.forward(*input, *kwargs) File “/home/szhangcj/python/GBGAN/celebA_attention/sagan_models.py”, line 100, in forward out,p1 = self.attn1(out) File “/usr/local/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 477, in call result = self.forward(input, **kwargs) File “/home/szhangcj/python/GBGAN/celebA_attention/sagan_models.py”, line 32, in forward energy = torch.bmm(proj_query,proj_key) # transpose check RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCBlas.cu:411
Code
The following is part of my code.
Environment
Additional Context
It seems that the function detach() causes the problem. Because I try to run several Pytorch code released on GitHub and detach() is always in the error. But I do not get the error by runing the same programming on the previous.