train_step_V.backward() in solver_gan.py - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

RansSelected commented 4 years ago

Hello Zhenyue Qin,

Thank you for your implementation! I'm trying to run the code locally and I have some issues. While running both, here what I have:

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1438, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Volumes/Kristen_HDD/DATA_MINDED_2020/Implementation-MolGAN-PyTorch/main_gan.py", line 65, in <module>
    main(config)
  File "/Volumes/Kristen_HDD/DATA_MINDED_2020/Implementation-MolGAN-PyTorch/main_gan.py", line 58, in main
    solver.train_and_validate()
  File "/Volumes/Kristen_HDD/DATA_MINDED_2020/Implementation-MolGAN-PyTorch/solver_gan.py", line 218, in train_and_validate
    self.train_or_valid(epoch_i=i, train_val_test='train')
  File "/Volumes/Kristen_HDD/DATA_MINDED_2020/Implementation-MolGAN-PyTorch/solver_gan.py", line 379, in train_or_valid
    train_step_V.backward()
  File "/Users/krisku/opt/miniconda3/envs/molgan_pytorch_1.5/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/Users/krisku/opt/miniconda3/envs/molgan_pytorch_1.5/lib/python3.7/site-packages/torch/autograd/__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [512, 45]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

So, basically in the file solver_gan.py, in line 379, where you do:

# Optimise value network.
                if cur_step % self.n_critic == 0:
                    train_step_V.backward()
                    self.v_optimizer.step()

and a problem arises in train_step_V.backward()....

My environment has such specs:

numpy                     1.18.1           py37h7241aed_0  
numpy-base                1.18.1           py37h3304bdc_1  
pandas                    1.0.3            py37h6c726b0_0  
pillow                    7.1.2            py37h4655f20_0  
**pip                       20.0.2**                   py37_3  
pysmiles                  1.0.0                    pypi_0    pypi
**python                    3.7.7**                hf48f09d_4  
python-dateutil           2.8.1                      py_0  
**pytorch                   1.5.0**                   py3.7_0    pytorch
pytz                      2020.1                     py_0  
rdkit                     2020.03.2.0      py37h65625ec_1    rdkit
readline                  8.0                  h1de35cc_0  
scikit-learn              0.22.1           py37h27c97d8_0    anaconda
scipy                     1.4.1            py37h9fa6033_0    anaconda
setuptools                47.1.1                   py37_0  
sqlite                    3.31.1               h5c1f38d_1  
torchvision               0.6.0                  py37_cpu    pytorch
wheel                     0.34.2                   py37_0

Maybe you have any ideas about what goes wrong?

All best regards, Kris

kfzyqin commented 4 years ago

Hi Kris,

Thanks for pointing out the issue.

The code runs well on my computer. My suspicion is the PyTorch different versions. I am using PyTorch 1.4.0.

If it doesn't work, please let me know and I will further think.

RansSelected commented 4 years ago

Update: Everything works smoothly with the earlier version of PyTorch: https://pytorch.org/get-started/previous-versions/ pytorch==1.2.0 torchvision==0.4.0

krk-krk-krk commented 3 years ago

Hi. ZhenyueQin.

I change this code of solver_gan.py, in line 379.

before

if train_val_test == 'train':
    self.reset_grad()

    # Optimise generator.
    if cur_step % self.n_critic == 0:
        train_step_G.backward(retain_graph=True)
        self.g_optimizer.step()

    # Optimise value network.
    if cur_step % self.n_critic == 0:
        train_step_V.backward()
        self.v_optimizer.step()

after

if train_val_test == 'train':
    self.reset_grad()

    # Optimise generator.
    if cur_step % self.n_critic == 0:
        train_step_G.backward(retain_graph=True)
        train_step_V.backward()
        self.g_optimizer.step()
        self.v_optimizer.step()

    # Optimise value network.
    # if cur_step % self.n_critic == 0:
    #     train_step_V.backward()
    #     self.v_optimizer.step()

pykao commented 3 years ago

@krk-krk-krk I changed it as well and it works for me.

shikhar2402 commented 2 years ago

Thanks @krk-krk-krk This worked for me. No need to downgrade pytorch version.

kfzyqin / Implementation-MolGAN-PyTorch

train_step_V.backward() in solver_gan.py - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation #1