Closed dawn2034 closed 6 years ago
episode += 1
increases the episode count, from 0, to whatever episode the agent is currently at. This is used to calculate a new epsilon value one line below. Increasing episode count decreases epsilon which increases the likelyhood of action selection via the greedy policy.
It is normal that the agent does not learn how finish the game using the hyper-parameters that he use?
I have the same issue as @rscova
The algorithm with the current settings does not learn to move to the goal. Sometimes I get a Q-table full of zeros after training for 50,000 episodes. Other times I get non-zero values, but the agent moves very inefficiently and never reaches the goal.
@CarterEllsworth Isn't this a loop? for episode in range(total_episodes)
means that episode
is automatically incremented so there wouldn't be a need to manually increase episode
.
@WillKoehrsen Yes, the problem is the loop. @CarterEllsworth
Indeed the problem comes from the loop thanks for figuring out ! 👍
Consequently, I've just modified the notebook:
I've just credited you in the commit.
About the slippery, indeed is normal, we are in a stochastic environment, I didn't want to transform it as deterministic because it would be to simple for a q-learning problem. But you can make it stochastic if you want.
Thanks @simoninithomas. Does it mean that in Taxi-v2 we have to remove "episode += 1" from the loop too?
@jamac22 yes
I have a trouble in the code at 41th line "episode += 1" , why does episode need to +1 here?