Problem with running on cuda 11.2

lysektomas commented 3 years ago

Hi, thanks for this great repo!

I am trying to run this repo with nvidia rtx 3090 and cuda 11.2. I have this error whole day. I was trying different versions of pytorch and no use.

I am getting this error:

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/glowtts/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/user/tts/glow-tts/train.py", line 92, in train_and_eval
    train(rank, epoch, hps, generator, optimizer_g, train_loader, None, None)
  File "/home/user/tts/glow-tts/train.py", line 119, in train
    loss_g.backward()
  File "/home/user/anaconda3/envs/glowtts/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/user/anaconda3/envs/glowtts/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
SystemError: <built-in method run_backward of torch._C._EngineBase object at 0x7f09d7de3840> returned NULL without setting an error

I have Ubuntu 20.04, Driver Version: 460.32.03 CUDA Version: 11.2.

I was trying torch 1.7.1, 1.2.0, 1.3.0. I was even compiling torch from source ('1.9.0a0+ee04cd9').

I was removing cuda, installing it from scratch and no luck.

Do you have any idea what is causing this problem?

Thanks!

patdflynn commented 3 years ago

Did you ever find a solution? We're also experiencing issues with 3090s.

Linths commented 3 years ago

Wondering the same, also have this issue

jaywalnut310 / glow-tts

Problem with running on cuda 11.2 #50