Faster hessian_vector_product

vuoristo commented 6 years ago

KL Divergence gradients don't have to be recomputed on every iteration of conjugate_gradient

This results in ~ 10 % speedup of the overall algorithm on my setup.

vuoristo commented 6 years ago

I checked the behavior here again and I found that for some reason, on the first iteration when hessian_vector_product is called, the grad2s are slightly different from what they would be without this change. The difference is small with standard deviation of 1e-8 across all of the policy parameters. After the first call to hessian_vector_product the difference disappears.

I ran the algorithm on HalfCheetahDir-v1 a couple of times with and without this change and the results look similar. However, it's not exactly the same so I'll investigate why such difference exists in the first place.

vuoristo commented 6 years ago

Looks like figuring out the exact difference between the two implementations is a bit more complicated than I expected. I don't have time to pursue the investigation further now, so feel free to close this.

If you are interested, I was able to narrow down the problem a little bit. The problem is that stepdir computed in metalearner.step has a small difference between the two implementations on the first time it is computed. The difference disappears on later calls to metalearner.step.

Setting first_order=True in metalearner.adapt eliminates the difference.
Computing the gradients of the kl divergence twice in the beginning of hessian_vector_product seems to eliminate the difference for some reason. And I mean literally just running the following code twice.
```
kl = self.kl_divergence(episodes)
grads = torch.autograd.grad(kl, self.policy.parameters(), create_graph=True)
```
There's no difference between the stepdir computed using the different implementations for Bandit-K5-v0 environment.

Based on these findings I think computing the gradients of the kl divergence for the normal_mlp policy leaves some bit of state dangling somewhere in the computational graph, that is then used a bit inconsistently in computing the gradients the next time.

tristandeleu commented 4 years ago

I have included this patch in the newest version of the code.

tristandeleu / pytorch-maml-rl

Faster hessian_vector_product #13