why detaching the state values when computing the advantage functions

nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch

MIT License

1.66k stars 343 forks source link

Closed jingxixu closed 3 years ago

jingxixu commented 3 years ago

I am not sure why you detach the state values when computing the advantage functions? Specifically, I am talking about

advantages = rewards - state_values.detach()

Many thanks!

nikhilbarhate99 commented 3 years ago

refer to #29