openai / spinningup

An educational resource to help anyone learn deep reinforcement learning.
https://spinningup.openai.com/
MIT License
10.2k stars 2.23k forks source link

Error in TRPO KL Divergence Calculations #264

Open jamesborg46 opened 4 years ago

jamesborg46 commented 4 years ago

Apologies if I am misunderstanding something but it seems that the direction of the KL divergence calculations used throughout the TRPO code seems to be at odds with the TRPO paper. Instead of KL[new params | old params], should we not be taking KL[old params | new params].

TRPO Code: https://github.com/openai/spinningup/blob/038665d62d569055401d91856abb287263096178/spinup/algos/tf1/trpo/core.py#L136

TRPO Paper: image

Moreover, should we not be taking the gradients with respect to the second set of parameters, not the first, as suggested by this part of the paper?

image

Kuroo commented 2 years ago

I think that you are right. Params are swapped in KL in this case. However, second order gradient for both KLs is the same so it is not an issue in practice.