I found that the observation of actuator in next_state is not equal to which in action but equal it two steps later, for example:
t : action = 1 observation = 0
t+1: action = 2 observation = 0
t+3: action = 3 observation = 1
t+4 :action = 4 observation = 2
example code as below
I found that the observation of actuator in next_state is not equal to which in action but equal it two steps later, for example: t : action = 1 observation = 0 t+1: action = 2 observation = 0 t+3: action = 3 observation = 1 t+4 :action = 4 observation = 2