Q Learning with FrozenLake Step 4: The Q learning algorithm

simoninithomas / Deep_reinforcement_learning_Course

Implementations from the free course Deep Reinforcement Learning with Tensorflow and PyTorch

http://www.simoninithomas.com/deep-rl-course

3.79k stars 1.23k forks source link

Q Learning with FrozenLake Step 4: The Q learning algorithm #5

Closed dawn2034 closed 6 years ago

dawn2034 commented 6 years ago

I have a trouble in the code at 41th line "episode += 1" , why does episode need to +1 here?

CarterEllsworth commented 6 years ago

episode += 1 increases the episode count, from 0, to whatever episode the agent is currently at. This is used to calculate a new epsilon value one line below. Increasing episode count decreases epsilon which increases the likelyhood of action selection via the greedy policy.

rscova commented 6 years ago

It is normal that the agent does not learn how finish the game using the hyper-parameters that he use?

jroberayalas commented 6 years ago

I have the same issue as @rscova

The algorithm with the current settings does not learn to move to the goal. Sometimes I get a Q-table full of zeros after training for 50,000 episodes. Other times I get non-zero values, but the agent moves very inefficiently and never reaches the goal.

WillKoehrsen commented 6 years ago

@CarterEllsworth Isn't this a loop? for episode in range(total_episodes) means that episode is automatically incremented so there wouldn't be a need to manually increase episode.

dawn2034 commented 6 years ago

@WillKoehrsen Yes, the problem is the loop. @CarterEllsworth

simoninithomas commented 6 years ago

Indeed the problem comes from the loop thanks for figuring out ! 👍

Consequently, I've just modified the notebook:

Modified the decay_rate to 0.005 (thanks to lukewys
Remove episode +=1 (thanks to all 👍 )
Modified the "watch our agent play" to only print the last state (to see if we are in the goal or not) and how many step did it took.

I've just credited you in the commit.

About the slippery, indeed is normal, we are in a stochastic environment, I didn't want to transform it as deterministic because it would be to simple for a q-learning problem. But you can make it stochastic if you want.

jamac22 commented 6 years ago

Thanks @simoninithomas. Does it mean that in Taxi-v2 we have to remove "episode += 1" from the loop too?

simoninithomas commented 6 years ago

@jamac22 yes