simoninithomas / Deep_reinforcement_learning_Course

Implementations from the free course Deep Reinforcement Learning with Tensorflow and PyTorch
http://www.simoninithomas.com/deep-rl-course
3.79k stars 1.23k forks source link

Q Learning with FrozenLake Step 4: The Q learning algorithm #5

Closed dawn2034 closed 6 years ago

dawn2034 commented 6 years ago

I have a trouble in the code at 41th line "episode += 1" , why does episode need to +1 here?

CarterEllsworth commented 6 years ago

episode += 1 increases the episode count, from 0, to whatever episode the agent is currently at. This is used to calculate a new epsilon value one line below. Increasing episode count decreases epsilon which increases the likelyhood of action selection via the greedy policy.

rscova commented 6 years ago

It is normal that the agent does not learn how finish the game using the hyper-parameters that he use?

jroberayalas commented 6 years ago

I have the same issue as @rscova

The algorithm with the current settings does not learn to move to the goal. Sometimes I get a Q-table full of zeros after training for 50,000 episodes. Other times I get non-zero values, but the agent moves very inefficiently and never reaches the goal.

WillKoehrsen commented 6 years ago

@CarterEllsworth Isn't this a loop? for episode in range(total_episodes) means that episode is automatically incremented so there wouldn't be a need to manually increase episode.

dawn2034 commented 6 years ago

@WillKoehrsen Yes, the problem is the loop. @CarterEllsworth

simoninithomas commented 6 years ago

Indeed the problem comes from the loop thanks for figuring out ! 👍

Consequently, I've just modified the notebook:

I've just credited you in the commit.

About the slippery, indeed is normal, we are in a stochastic environment, I didn't want to transform it as deterministic because it would be to simple for a q-learning problem. But you can make it stochastic if you want.

jamac22 commented 6 years ago

Thanks @simoninithomas. Does it mean that in Taxi-v2 we have to remove "episode += 1" from the loop too?

simoninithomas commented 6 years ago

@jamac22 yes