nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link

Confusion about the loss function #52

Closed tlt18 closed 2 years ago

tlt18 commented 2 years ago

I don't get it. In PPO.py-ActorCritic-evaluate, it only calculates the entropy of old_action, missing KLD between pi_old and pi like the paper of PPO said.

nikhilbarhate99 commented 2 years ago

As mentioned in the README, this repo only implements the clipped objective version of PPO, and not the adaptive KL penalty objective to which you are referring to