RuntimeError when attempting to use main_train.py out-of-the-box

cdjameson commented 3 years ago

Hello,

Thanks for providing this repository. It appears there is an issue with the model that is being caught by a version of PyTorch that is newer. I am happy to downgrade if that will fix it, but since the requirements doesn't list versions I do not know which version would work.

I've included the entire traceback here, including the torch.autograd.set_detect_anomaly(True) output, which points to line 60 in "models.py":

[W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnConvolutionBackward. Traceback of forward call that caused the error:
  File "main_train.py", line 29, in <module>
    train(opt, Gs, Zs, reals, NoiseAmp)
  File "./SinGAN/training.py", line 39, in train
    z_curr,in_s,G_curr = train_single_scale(D_curr,G_curr,reals,Gs,Zs,in_s,NoiseAmp,opt)
  File "./SinGAN/training.py", line 156, in train_single_scale
    fake = netG(noise.detach(),prev)
  File "python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "./SinGAN/models.py", line 60, in forward
    x = self.tail(x)
  File "python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "python3.6/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "python3.6/site-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "python3.6/site-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
    self.padding, self.dilation, self.groups)
 (function _print_stack)
Traceback (most recent call last):
  File "main_train.py", line 29, in <module>
    train(opt, Gs, Zs, reals, NoiseAmp)
  File "./SinGAN/training.py", line 39, in train
    z_curr,in_s,G_curr = train_single_scale(D_curr,G_curr,reals,Gs,Zs,in_s,NoiseAmp,opt)
  File "./SinGAN/training.py", line 179, in train_single_scale
    errG.backward(retain_graph=True)
  File "python3.6/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "python3.6/site-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 32, 3, 3]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

tamarott commented 3 years ago

pytorch 1.4 should solve this

stymex commented 3 years ago

I had the same issue. Installing pytorch 1.4 solved it, but I would reccomend installing 0.5.0 torchvision at the same time, overwise it still won't work. conda install pytorch==1.4.0 torchvision==0.5.0 -c pytorch

tamarott / SinGAN

RuntimeError when attempting to use main_train.py out-of-the-box #147