maximilianigl / DVRL

Deep Variational Reinforcement Learning
Apache License 2.0
134 stars 25 forks source link

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation #5

Closed weiguowilliam closed 3 years ago

weiguowilliam commented 3 years ago

System info: torch 1.7.1 (For some reason I cannot use torch 0.4) cuda 11.1 python 3.6.13

When I run the OpenAI case code, it gives the error 'RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation'. It seems that some gradient updates may be problematic.

Based on this link (https://discuss.pytorch.org/t/solved-pytorch1-5-runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation/90256/4), it says that 'Before 1.5, these tests were not working properly for the optimizers. That’s why you didn’t see any error. But the computed gradients were not correct.'.

Could you help me check if there may be some error here? Thank you.

INFO - root - Number of parameters =    4336264
INFO - root - Total number of updates: 625000
INFO - root - Learning rate: 0.0002
INFO - root -       Progr | FPS | NKP | ToL | avg | med | min | max || Losses: | ent | val | act | enc || pri | emi | rew |
INFO - root -       ------|-----|-----|-----|-----|-----|-----|-----||---------|-----|-----|-----|-----||-----|-----|-----|
INFO - root - Updt: 0.0   |58   |11.17|636.0|0.0  |0.0  |0.0  |0.0  ||         |1.260|3.445|-0.09|6344.||368.3|6220.|-----
ERROR - POMRL - Failed after 0:00:08!
Traceback (most recent calls WITHOUT Sacred internals):
  File "./code/main.py", line 567, in main
    total_loss.backward(retain_graph=retain_graph)
  File "/home/weiguo/anaconda3/envs/pomdp/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/weiguo/anaconda3/envs/pomdp/lib/python3.6/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [256, 768]], which is output 0 of TBackward, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Exception ignored in: <bound method SubprocVecEnv.__del__ of <baselines.common.vec_env.subproc_vec_env.SubprocVecEnv object at 0x7fa8da7601d0>>

After I set utils.log_and_print when retain_graph = False, it gives me 1 successful time of parameter updates, but for the 2nd time, it fails.

INFO - root - Number of parameters =    4336264
INFO - root - Total number of updates: 625000
INFO - root - Learning rate: 0.0002
INFO - root -       Progr | FPS | NKP | ToL | avg | med | min | max || Losses: | ent | val | act | enc || pri | emi | rew |
INFO - root -       ------|-----|-----|-----|-----|-----|-----|-----||---------|-----|-----|-----|-----||-----|-----|-----|
INFO - root - Updt: 0.0   |56   |11.17|662.1|0.0  |0.0  |0.0  |0.0  ||         |1.313|3.251|0.003|6604.||376.6|6470.|-----
ERROR - POMRL - Failed after 0:00:08!
Traceback (most recent calls WITHOUT Sacred internals):
  File "./code/main.py", line 573, in main
    total_loss.backward(retain_graph=retain_graph)
  File "/home/weiguo/anaconda3/envs/pomdp/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/weiguo/anaconda3/envs/pomdp/lib/python3.6/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: 
maximilianigl commented 3 years ago

I believe I know what this is. The fundamental problem is that we want to train an RNN, which recursively updates its latent state. So theoretically the right thing to do is to feed entire episodes into the network and then backpropagate gradients through the entire episode. Of course, that's infeasible. So at the time of writing of the paper, what people did was basically just ignore this fact and just train on e.g. n steps like done in A2C (since then paper like R2D2 came out that investigate that problem more). However, this is wrong, because now at the first timestep, how do you initialize the RNN latent state? If you initialize it from e.g. 0, it's wrong. If you initialize with the last latent state from the last n-step update, it's better, but still wrong because a) your parameters changed, so if you were to run the same trajectory again, you'd get a slightly different latent state and b) you don't backpropagate gradients through it because it's treated as a constant. There's nothing really you can do against a), but for b) I used retrain_graph=True to allow it to still backprop gradients through it. Now those gradients are (slightly) wrong, because your parameters actually changed since you computed that latent state (i.e. problem 'a' and that's I believe what It's complaining about), but it still massively improves performance (that's figure 5c) in the paper.

Long story short:

weiguowilliam commented 3 years ago

Thank you, Max! I'll test it soon.

weiguowilliam commented 3 years ago

Thank you so much for your help! Really appreciated. Setting multiplier_backprop_length=1 removes the error message. Simply set num_steps doesn't remove the error. I'll try to use a version of 0.4.

maximilianigl commented 3 years ago

Ah, sorry, bad formulation on my part. I mean setting num_steps higher to counteract the drop in performance that you might get from setting the backprop length to 1 (which might or might not work as it's not exactly the same). What happens in the code (slightly simplified) is that the agent takes num_steps at a time but backpropagates gradients for up to num_steps * multiplier_backprop_length.

weiguowilliam commented 3 years ago

Thank you Max! I'll test with that.