The clean robot example on chapter 1 ?

ngthanhtin commented 5 years ago

Hello, I really don't understand this example in chapter one: why the robot begin at state(1,1) and takes up (or down, left, right) action but have 3 subsequent states like that. Thank you.

mpatacchiola commented 5 years ago

Hi @ngthanhtin

You have to remember that the environment includes a random component!

The action you want to take (let's suppose you want to go UP starting from the initial state) has 80% probability of happening, but there is a small probability (10%) that the robot may go left, and a small probability (10%) it may end up on the right. This is why you see in that image that for each possible action there are three possible outcomes.

Every environment may (or may not) have this kind of stochastic behavior. Think about the case of someone driving a car. The driver wants to turn left, but there is still a small probability that the car can slip on the other direction, or do not turn at all.

Hope my explanation has been clear.

ngthanhtin commented 5 years ago

I see, thank you very much :)

mpatacchiola / dissecting-reinforcement-learning

The clean robot example on chapter 1 ? #14