Why does PPO use monte carlo estimation instead of value function estimation?

nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch

MIT License

1.57k stars 332 forks source link

Closed outdoteth closed 3 years ago

outdoteth commented 3 years ago

Why is the value function compared with a monte carlo estimate instead of:

v(s_1) - (r + gamma * v(s_2))

nikhilbarhate99 commented 3 years ago

monte carlo estimate is more stable than bootstrapping and is a standard for on-policy methods