nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link

Why does PPO use monte carlo estimation instead of value function estimation? #39

Closed outdoteth closed 3 years ago

outdoteth commented 3 years ago

Why is the value function compared with a monte carlo estimate instead of:

v(s_1) - (r + gamma * v(s_2))

nikhilbarhate99 commented 3 years ago

monte carlo estimate is more stable than bootstrapping and is a standard for on-policy methods