Closed weiguowilliam closed 3 years ago
I believe I know what this is.
The fundamental problem is that we want to train an RNN, which recursively updates its latent state. So theoretically the right thing to do is to feed entire episodes into the network and then backpropagate gradients through the entire episode. Of course, that's infeasible. So at the time of writing of the paper, what people did was basically just ignore this fact and just train on e.g. n steps like done in A2C (since then paper like R2D2 came out that investigate that problem more). However, this is wrong, because now at the first timestep, how do you initialize the RNN latent state? If you initialize it from e.g. 0, it's wrong. If you initialize with the last latent state from the last n-step update, it's better, but still wrong because a) your parameters changed, so if you were to run the same trajectory again, you'd get a slightly different latent state and b) you don't backpropagate gradients through it because it's treated as a constant. There's nothing really you can do against a), but for b) I used retrain_graph=True
to allow it to still backprop gradients through it. Now those gradients are (slightly) wrong, because your parameters actually changed since you computed that latent state (i.e. problem 'a' and that's I believe what It's complaining about), but it still massively improves performance (that's figure 5c) in the paper.
Long story short:
algorithm.multiplier_backprop_length=1
, but as shown in Fig 5c that will reduce performance quite a lot but should get rid of the error messagerl_setting.num_steps
instead (e.g. by a factor of up to 5, i.e. to 25), but that has a whole lot of other trade-offs so it's not reproducing the original algorithm. It might work even better, but it might also not or perform worse.Thank you, Max! I'll test it soon.
Thank you so much for your help! Really appreciated.
Setting multiplier_backprop_length=1
removes the error message.
Simply set num_steps
doesn't remove the error.
I'll try to use a version of 0.4.
Ah, sorry, bad formulation on my part. I mean setting num_steps
higher to counteract the drop in performance that you might get from setting the backprop length to 1 (which might or might not work as it's not exactly the same).
What happens in the code (slightly simplified) is that the agent takes num_steps
at a time but backpropagates gradients for up to num_steps * multiplier_backprop_length
.
Thank you Max! I'll test with that.
System info: torch 1.7.1 (For some reason I cannot use torch 0.4) cuda 11.1 python 3.6.13
When I run the OpenAI case code, it gives the error 'RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation'. It seems that some gradient updates may be problematic.
Based on this link (https://discuss.pytorch.org/t/solved-pytorch1-5-runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation/90256/4), it says that 'Before 1.5, these tests were not working properly for the optimizers. That’s why you didn’t see any error. But the computed gradients were not correct.'.
Could you help me check if there may be some error here? Thank you.
After I set
utils.log_and_print
when retain_graph = False, it gives me 1 successful time of parameter updates, but for the 2nd time, it fails.