Possible bugs : Determine action with previous ( not current ) state

Hi,

I feel like something is wrong with gw.step() call at (https://github.com/stormmax/irl-imitation/blob/master/maxent_irl_gridworld.py#L95) and (https://github.com/stormmax/irl-imitation/blob/master/deep_maxent_irl_gridworld.py#L72) .

I think cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(cur_state)])) should be cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(next_state)])). By calling step() , current state inside gridworld object is iterated. So local variable here _nextstate (not _curstate confusingly) always corresponds to the current state, and that should be passed to the policy.

Do I misunderstand something?

yrlu / irl-imitation

Possible bugs : Determine action with previous ( not current ) state #5