tristandeleu / pytorch-maml-rl

Reinforcement Learning with Model-Agnostic Meta-Learning in Pytorch
MIT License
827 stars 158 forks source link

Faster hessian_vector_product #13

Closed vuoristo closed 4 years ago

vuoristo commented 6 years ago

KL Divergence gradients don't have to be recomputed on every iteration of conjugate_gradient

This results in ~ 10 % speedup of the overall algorithm on my setup.

vuoristo commented 6 years ago

I checked the behavior here again and I found that for some reason, on the first iteration when hessian_vector_product is called, the grad2s are slightly different from what they would be without this change. The difference is small with standard deviation of 1e-8 across all of the policy parameters. After the first call to hessian_vector_product the difference disappears.

I ran the algorithm on HalfCheetahDir-v1 a couple of times with and without this change and the results look similar. However, it's not exactly the same so I'll investigate why such difference exists in the first place.

vuoristo commented 6 years ago

Looks like figuring out the exact difference between the two implementations is a bit more complicated than I expected. I don't have time to pursue the investigation further now, so feel free to close this.

If you are interested, I was able to narrow down the problem a little bit. The problem is that stepdir computed in metalearner.step has a small difference between the two implementations on the first time it is computed. The difference disappears on later calls to metalearner.step.

Based on these findings I think computing the gradients of the kl divergence for the normal_mlp policy leaves some bit of state dangling somewhere in the computational graph, that is then used a bit inconsistently in computing the gradients the next time.

tristandeleu commented 4 years ago

I have included this patch in the newest version of the code.