I think
cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(cur_state)]))
should be
cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(next_state)])).
By calling step() , current state inside gridworld object is iterated. So local variable here
_nextstate (not _curstate confusingly) always corresponds to the current state, and
that should be passed to the policy.
Hi,
I feel like something is wrong with gw.step() call at (https://github.com/stormmax/irl-imitation/blob/master/maxent_irl_gridworld.py#L95) and (https://github.com/stormmax/irl-imitation/blob/master/deep_maxent_irl_gridworld.py#L72) .
I think
cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(cur_state)]))
should becur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(next_state)]))
. By calling step() , current state inside gridworld object is iterated. So local variable here _nextstate (not _curstate confusingly) always corresponds to the current state, and that should be passed to the policy.Do I misunderstand something?