osigaud / rl_labs_notebooks

Labs for understanding and coding Standard Reinforcement Learning concepts
52 stars 16 forks source link

East & West are inverted #1

Open tkchouaki opened 5 years ago

tkchouaki commented 5 years ago

I believe the East and West directions are inverted in the transition function. In fact, running this code (which asks the agent to go west) in the mdp.ipynb notebook gives the result below

%matplotlib notebook
from ipynb.fs.defs.maze_plotter import maze_plotter 
walls = [7,8,9,10,21,27,30,31,32,33,45,46,47]
height = 6
width = 9
m = maze_mdp(width, height, walls=walls) # maze-like MDP definition
m.reset()
m.step(3) #Which corresponds to going west
m.render()

result

I believe it is due to the initialization of the transition function in this portion of code

        # Transition Matrix when going east
        transition_matrix[:,E,:] = np.zeros((observation_space.size+1,observation_space.size+1))
        for i in observation_space.states : 
            if i<observation_space.height or i-observation_space.height in observation_space.walls or i in observation_space.walls: #state doesn't change (cells on the right side of a wall)
                transition_matrix[:,E,:][i][i] = 1.0
            else : #it goes left
                transition_matrix[:,E,:][i][i-height] = 1.0

        # Transition Matrix when going west
        transition_matrix[:,W,:] = np.zeros((observation_space.size+1,observation_space.size+1))
        for i in observation_space.states : 
            if i>observation_space.size-observation_space.height-1 or i+observation_space.height in observation_space.walls or i in observation_space.walls: #state doesn't change (cells on the left side of a wall)
                transition_matrix[:,W,:][i][i] = 1.0
            else : #it goes right
                transition_matrix[:,W,:][i][i+height] = 1.0

Inverting the directions handled by the two portions of code should solve the problem

osigaud commented 5 years ago

Yes, it seems you are right, the comment says "goes left" for east and "goes right" for west, which sounds wrong ;)

The good point about RL is that the agent does not care about the meaning of actions, so it will take appropriate actions anyways. But this needs to be fixed, it'll be done asap.

Thank you for spotting this.

tkchouaki commented 5 years ago

The maze_plotter should also be modified to make the arrows point to the right direction.

osigaud commented 5 years ago

No, to me the good idea rather consists in inverting east and west everywhere in the comments, these notions make no sense from the program point of view (the program is not wrong in that respect)