I trained policy for mario environment with
python train.py --default --env-id mario --noReward
And observed quite high external reward during the training:
[2019-03-15 01:24:17,798] True Game terminating: env_episode_reward=0.648666666667 episode_length=669
Episode finished. Sum of shaped rewards: 0.00. Length: 669. Bonus: 4.1677.
However, when I try to run the policy with inference.py with the following
python inference.py --env-id SuperMarioBros-1-1-v0 --default --log-dir ../mario/train
the agent continuously keeps trying to go left, which makes me think that the action space for the train and the inference is inconsistent (somehow swapped).
I trained policy for mario environment with
python train.py --default --env-id mario --noReward
And observed quite high external reward during the training:However, when I try to run the policy with inference.py with the following
python inference.py --env-id SuperMarioBros-1-1-v0 --default --log-dir ../mario/train
the agent continuously keeps trying to go left, which makes me think that the action space for the train and the inference is inconsistent (somehow swapped).Is there a way to fix it?