pathak22 / noreward-rl

[ICML 2017] TensorFlow code for Curiosity-driven Exploration for Deep Reinforcement Learning
Other
1.42k stars 301 forks source link

Inconsistent actions between train and inference on Mario #36

Open takuma-yoneda opened 5 years ago

takuma-yoneda commented 5 years ago

I trained policy for mario environment with python train.py --default --env-id mario --noReward And observed quite high external reward during the training:

[2019-03-15 01:24:17,798] True Game terminating: env_episode_reward=0.648666666667 episode_length=669
Episode finished. Sum of shaped rewards: 0.00. Length: 669. Bonus: 4.1677.

However, when I try to run the policy with inference.py with the following python inference.py --env-id SuperMarioBros-1-1-v0 --default --log-dir ../mario/train the agent continuously keeps trying to go left, which makes me think that the action space for the train and the inference is inconsistent (somehow swapped).

Is there a way to fix it?