VectorizedStateMachineTargetArrayCalculator.

The defect is here:

        for action in range(num_actions):
            next_int_ext_state = self.rl_system.model.apply_action(int_ext_state, action)
            reward = self.rl_system.reward_function(int_ext_state, action, next_int_ext_state)
            targets[action] = self.get_target(next_int_ext_state, action, reward)

The problem is that the next state has changed internal state, but the reward function only looks at the final state. So we need to fix the reward function.

jsphon / reinforcement_learning

VectorizedStateMachineTargetArrayCalculator. #33