Inconsistent actions between train and inference on Mario

I trained policy for mario environment with python train.py --default --env-id mario --noReward And observed quite high external reward during the training:

[2019-03-15 01:24:17,798] True Game terminating: env_episode_reward=0.648666666667 episode_length=669
Episode finished. Sum of shaped rewards: 0.00. Length: 669. Bonus: 4.1677.

However, when I try to run the policy with inference.py with the following python inference.py --env-id SuperMarioBros-1-1-v0 --default --log-dir ../mario/train the agent continuously keeps trying to go left, which makes me think that the action space for the train and the inference is inconsistent (somehow swapped).

Is there a way to fix it?

pathak22 / noreward-rl

Inconsistent actions between train and inference on Mario #36