I noticed that in the step function the state is recorded before the action is taken.
Which means that the observation returned is one step old and doesn't correspond to the current state.
Easy enough to fix by simply putting self.state = self.game.get_state() after reward = self.game.make_action(act).
I noticed that in the step function the state is recorded before the action is taken. Which means that the observation returned is one step old and doesn't correspond to the current state. Easy enough to fix by simply putting
self.state = self.game.get_state()
afterreward = self.game.make_action(act)
.