I don't get it. In PPO.py-ActorCritic-evaluate, it only calculates the entropy of old_action, missing KLD between pi_old and pi like the paper of PPO said.
As mentioned in the README, this repo only implements the clipped objective version of PPO, and not the adaptive KL penalty objective to which you are referring to
I don't get it. In PPO.py-ActorCritic-evaluate, it only calculates the entropy of old_action, missing KLD between pi_old and pi like the paper of PPO said.