mpatacchiola / dissecting-reinforcement-learning

Python code, PDFs and resources for the series of posts on Reinforcement Learning which I published on my personal blog
https://mpatacchiola.github.io/blog/
MIT License
609 stars 175 forks source link

The clean robot example on chapter 1 ? #14

Closed ngthanhtin closed 5 years ago

ngthanhtin commented 5 years ago

Hello, I really don't understand this example in chapter one: image why the robot begin at state(1,1) and takes up (or down, left, right) action but have 3 subsequent states like that. Thank you.

mpatacchiola commented 5 years ago

Hi @ngthanhtin

You have to remember that the environment includes a random component!

The action you want to take (let's suppose you want to go UP starting from the initial state) has 80% probability of happening, but there is a small probability (10%) that the robot may go left, and a small probability (10%) it may end up on the right. This is why you see in that image that for each possible action there are three possible outcomes.

Every environment may (or may not) have this kind of stochastic behavior. Think about the case of someone driving a car. The driver wants to turn left, but there is still a small probability that the car can slip on the other direction, or do not turn at all.

Hope my explanation has been clear.

ngthanhtin commented 5 years ago

I see, thank you very much :)