yrlu / irl-imitation

Implementation of Inverse Reinforcement Learning (IRL) algorithms in Python/Tensorflow. Deep MaxEnt, MaxEnt, LPIRL
587 stars 146 forks source link

Possible bugs : Determine action with previous ( not current ) state #5

Open eupktcha opened 6 years ago

eupktcha commented 6 years ago

Hi,

I feel like something is wrong with gw.step() call at (https://github.com/stormmax/irl-imitation/blob/master/maxent_irl_gridworld.py#L95) and (https://github.com/stormmax/irl-imitation/blob/master/deep_maxent_irl_gridworld.py#L72) .

I think cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(cur_state)])) should be cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(next_state)])). By calling step() , current state inside gridworld object is iterated. So local variable here _nextstate (not _curstate confusingly) always corresponds to the current state, and that should be passed to the policy.

Do I misunderstand something?

dahehe98 commented 10 months ago

I totally agree with you.