nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.66k stars 343 forks source link

why detaching the state values when computing the advantage functions #43

Closed jingxixu closed 3 years ago

jingxixu commented 3 years ago

I am not sure why you detach the state values when computing the advantage functions? Specifically, I am talking about

advantages = rewards - state_values.detach()

Many thanks!

nikhilbarhate99 commented 3 years ago

refer to #29