Cublas run time error with RTX 2080Ti with Cuda 9.0

crownk1997 commented 5 years ago

🐛 Bug

Recently, I try to run my previous programming with our new multiple GPUs servers with RTX 2080 Ti. I do not make any about my code which can run successfully on Cuda 9.0 with Tesla V100. I am not sure what the problem is and it seems that there is a problem with Cuda support. I have tried to use CUDA_LAUNCH_BLOCKING=1, but this cannot solve the problem.

The error is as the following.

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument Traceback (most recent call last): File “main.py”, line 166, in pimg.copy(netG(p_z).detach()) File “/usr/local/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 477, in call result = self.forward(*input, *kwargs) File “/home/szhangcj/python/GBGAN/celebA_attention/sagan_models.py”, line 100, in forward out,p1 = self.attn1(out) File “/usr/local/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 477, in call result = self.forward(input, **kwargs) File “/home/szhangcj/python/GBGAN/celebA_attention/sagan_models.py”, line 32, in forward energy = torch.bmm(proj_query,proj_key) # transpose check RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCBlas.cu:411

Code

The following is part of my code.

z_b = torch.FloatTensor(opt.batch_size, opt.z_dim).to(device)
img_b = torch.FloatTensor(opt.batch_size, 3, 64, 64).to(device)
img_a = torch.FloatTensor(opt.batch_size, 3, 64, 64).to(device)
p_z = torch.FloatTensor(pool_size, opt.z_dim).to(device)
p_img = torch.FloatTensor(pool_size, 3, 64, 64).to(device)

show_z_b = torch.FloatTensor(100, opt.z_dim).to(device)
eval_z_b = torch.FloatTensor(250, opt.z_dim).to(device) # 250/batch * 120 --> 300000

optim_D = optim.Adam(netD.parameters(), lr=opt.lr_d) # other param?
optim_G = optim.Adam(netG.parameters(), lr=opt.lr_g) #?suitable
criterion_G = nn.MSELoss()

eta = 1
loss_GD = []
pre_loss = 0
cur_loss = 0
G_epoch = 1
for epoch in range(start_epoch, start_epoch + opt.num_epoch):
print('Start epoch: %d' % epoch)
## input_pool: [pool_size, opt.z_dim] -> [pool_size, 32, 32]
netD.train()
netG.eval()
p_z.normal_()
print(netG(p_z).detach().size())
p_img.copy_(netG(p_z).detach())

for t in range(opt.period): 

    for _ in range(opt.dsteps):

        t = time.time()
        ### Update D
        netD.zero_grad()
        ## real
        real_img, _ = next(iter(dataloader)) # [batch_size, 1, 32, 32]
        img_b.copy_(real_img.squeeze().to(device))
        real_D_err = torch.log(1 + torch.exp(-netD(img_b))).mean()
        print("D real loss", netD(img_b).mean())
        # real_D_err.backward()

        ## fake
        z_b_idx = random.sample(range(pool_size), opt.batch_size)
        img_a.copy_(p_img[z_b_idx])
        fake_D_err = torch.log(1 + torch.exp(netD(img_a))).mean() # torch scalar[]
        loss_gp = calc_gradient_penalty(netD, img_b, img_a)
        total_loss = real_D_err + fake_D_err + loss_gp
        print("D fake loss", netD(img_a).mean())
        total_loss.backward()

        optim_D.step()

    ## update input pool            
    p_img_t = p_img.clone().to(device)
    p_img_t.requires_grad_(True)
    if p_img_t.grad is not None:
        p_img_t.grad.zero_()
    fake_D_score = netD(p_img_t)

    fake_D_score.backward(torch.ones(len(p_img_t)).to(device))

    p_img = img_truncate(p_img + eta * p_img_t.grad)
    print("The mean of gradient", torch.mean(p_img_t.grad))
##update G after several steps
netG.train()
netD.eval()
poolset = PoolSet(p_z.cpu(), p_img.cpu())
poolloader = torch.utils.data.DataLoader(poolset, batch_size=opt.G_batch_size, shuffle=True, num_workers=opt.workers)

Environment

PyTorch Version (e.g., 1.0): 0.4.1.post2
OS (e.g., Linux): Ubuntu 16.04.5 LTS (GNU/Linux 4.15.0-43-generic x86_64)
I installed PyTorch under conda
Python version: Python 3.6.6
CUDA/cuDNN version: Cuda compilation tools, release 9.0, V9.0.176
GPU models and configuration: 9 GPUs
Any other relevant information: RTX 2080Ti

Additional Context

It seems that the function detach() causes the problem. Because I try to run several Pytorch code released on GitHub and detach() is always in the error. But I do not get the error by runing the same programming on the previous.

soumith commented 5 years ago

RTX 2080Ti needs CUDA10 version of PyTorch to be installed, not CUDA9. That's likely the reason for the error.

HideUnderBush commented 5 years ago

RTX 2080Ti needs CUDA10 version of PyTorch to be installed, not CUDA9. That's likely the reason for the error.

Actually cuda9.2 works, I uninstall cuda9.1 pytorch and then reinstalled the cuda9.2 pytorch0.4.1, then the messages disappeared.

xujin1184104394 commented 5 years ago

My RTX2080ti reports an error while running CUDA10.0: RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:116

But the code runs on 1080ti and cuda10. Is there a similar problem with your use?

ClimbsRocks commented 5 years ago

I'm also seeing the same error, and also have a 2080 TI. RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch-nightly_1549566624064/work/aten/src/THC/THCBlas.cu:259

I'm certainly not ruling out a bug in my code. Interestingly, I was able to get many models to train, but now this error keeps coming up. Any insight would be very welcome!

ClimbsRocks commented 5 years ago

Ah, found it! I was running PyTorch with https://github.com/facebookresearch/maskrcnn-benchmark , and had to recompile the maskrcnn-benchmark library after installing.

From an issue on there, it seems that a particular version of pytorch-nightly had this issue, but the main releases of PyTorch don't. So once I updated to one of the main releases (PyTorch version: 1.0.1.post2), and rebuilt the library I was using, the issue went away.

NeverGiveU commented 4 years ago

well, in my case, pytorch=1.1.0, python=3.5.4, gcc=5.4, 2080ti, and cuda=10.0, and the problem is caused by the use of torch.nn.utils.spectral_norm in the discriminator. Now i can just remove the normalization.

DLT10010 commented 4 years ago

我也看到了相同的错误，并且还有2080 TI。 RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch-nightly_1549566624064/work/aten/src/THC/THCBlas.cu:259

我当然不会排除代码中的错误。有趣的是，我能够训练很多模型，但是现在这个错误不断出现。任何见识将非常欢迎！

2070也出现这种错误了，环境是torch1.01，Python3.6怎么解决呢，谢谢

17314796423 commented 4 months ago

RTX 2080Ti needs CUDA10 version of PyTorch to be installed, not CUDA9. That's likely the reason for the error.

Actually cuda9.2 works, I uninstall cuda9.1 pytorch and then reinstalled the cuda9.2 pytorch0.4.1, then the messages disappeared.

Your solution worked perfectly and saved me a lot of time. Really appreciate it! my 2080ti works well with torch-0.4.1, cuda 9.2, and cudnn 7.6.5.

pytorch / pytorch