Closed outdoteth closed 3 years ago
Why is the value function compared with a monte carlo estimate instead of:
v(s_1) - (r + gamma * v(s_2))
monte carlo estimate is more stable than bootstrapping and is a standard for on-policy methods
monte carlo estimate
bootstrapping
Why is the value function compared with a monte carlo estimate instead of:
v(s_1) - (r + gamma * v(s_2))