Closed dniku closed 5 years ago
@ludwigschubert could you take a look maybe?
For the record, here is the list of actions taken by the model (one action per line). Each action should be passed to the envs in a single-element list because envs are wrapped in DummyVecEnv
:
with args.load_path.open('r') as fp:
for action in tqdm(fp, postfix='playing'):
obs, reward, done, infos = eval_envs.step([action])
# ...
Thanks for reopening this issue and providing more details! From looking at this video, I believe this is a bug in the game itself, and it looks like these sorts of bugs are being tracked on this issue: https://github.com/mgbellemare/Arcade-Learning-Environment/issues/262 Could you post a comment there linking to this? I will likely close this issue later because this is a bug in ALE, not gym itself.
Closing this issue as mentioned earlier, thanks for all the information, but it looks like this is up to ALE to decide what to do here.
According to Wikipedia, this is by design:
Once the second screen of bricks is destroyed, the ball in play harmlessly bounces off empty walls until the player restarts the game, as no additional screens are provided.
The score of 864 can be seen achieved here on a hardware Atari 2600.
This post also provides disassembly of the original game, showing the code which switches to the next level. It is basically if score == 432: refill_blocks()
.
Here is a super dirty proof of concept of how to make Breakout infinite: https://github.com/dniku/atari-py/commit/fc1dc149a796debd3198c6c96df4e839b4dbe2cc
This is a reopening of https://github.com/openai/gym/issues/309, as requested in that issue.
BreakoutNoFrameskip-v4
does not start a new level after all bricks are cleared twice. I was able to reproduce this with a well-trainedcnn
ppo2
Baselines model, although it seems that any model that can achieve a score of 864 will do (I have never seen a score of 864 exceeded).Links:
reproduce_gym_309.pkl
and place next to the script)I ran all experiments in a virtualenv. Here are the commands that I executed to reproduce the issue:
The script that I am providing simply loads the model and runs it, collecting gameplay frames, until the episode ends with a score of 864. Then it dumps the frames to a video file.
The output for me is (omitting log messages from Tensorflow and tqdm progress bar):
pip list
from virtualenv: