openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.82k stars 8.61k forks source link

BreakoutNoFrameskip-v4 does not advance to 3rd level, capping score at 864 #1618

Closed dniku closed 5 years ago

dniku commented 5 years ago

This is a reopening of https://github.com/openai/gym/issues/309, as requested in that issue.

BreakoutNoFrameskip-v4 does not start a new level after all bricks are cleared twice. I was able to reproduce this with a well-trained cnn ppo2 Baselines model, although it seems that any model that can achieve a score of 864 will do (I have never seen a score of 864 exceeded).

Links:

$ uname -srv
Linux 4.19.59-1-MANJARO #1 SMP PREEMPT Mon Jul 15 18:23:58 UTC 2019
$ python --version
Python 3.7.3

I ran all experiments in a virtualenv. Here are the commands that I executed to reproduce the issue:

virtualenv .env
source .env/bin/activate
pip install tensorflow-gpu gym[atari]
pip install git+https://github.com/openai/baselines.git
python reproduce_gym_309.py reproduce_gym_309.pkl

The script that I am providing simply loads the model and runs it, collecting gameplay frames, until the episode ends with a score of 864. Then it dumps the frames to a video file.

The output for me is (omitting log messages from Tensorflow and tqdm progress bar):

finished episode with reward=436.0, length=5799, elapsed_time=17.346772
finished episode with reward=735.0, length=4392, elapsed_time=30.163985
finished episode with reward=864.0, length=9447, elapsed_time=57.152439

pip list from virtualenv:

$ pip list 
Package              Version 
-------------------- --------
absl-py              0.7.1   
astor                0.8.0   
atari-py             0.2.6   
baselines            0.1.6   
Click                7.0     
cloudpickle          1.2.1   
future               0.17.1  
gast                 0.2.2   
google-pasta         0.1.7   
grpcio               1.22.0  
gym                  0.13.1  
h5py                 2.9.0   
joblib               0.13.2  
Keras-Applications   1.0.8   
Keras-Preprocessing  1.1.0   
Markdown             3.1.1   
numpy                1.16.4  
opencv-python        4.1.0.25
Pillow               6.1.0   
pip                  19.2.1  
protobuf             3.9.0   
pyglet               1.3.2   
scipy                1.3.0   
setuptools           41.0.1  
six                  1.12.0  
tensorboard          1.14.0  
tensorflow-estimator 1.14.0  
tensorflow-gpu       1.14.0  
termcolor            1.1.0   
tqdm                 4.32.2  
Werkzeug             0.15.5  
wheel                0.33.4  
wrapt                1.11.2
dniku commented 5 years ago

@ludwigschubert could you take a look maybe?

dniku commented 5 years ago

For the record, here is the list of actions taken by the model (one action per line). Each action should be passed to the envs in a single-element list because envs are wrapped in DummyVecEnv:

with args.load_path.open('r') as fp:
    for action in tqdm(fp, postfix='playing'):
        obs, reward, done, infos = eval_envs.step([action])
        # ...
christopherhesse commented 5 years ago

Thanks for reopening this issue and providing more details! From looking at this video, I believe this is a bug in the game itself, and it looks like these sorts of bugs are being tracked on this issue: https://github.com/mgbellemare/Arcade-Learning-Environment/issues/262 Could you post a comment there linking to this? I will likely close this issue later because this is a bug in ALE, not gym itself.

christopherhesse commented 5 years ago

Closing this issue as mentioned earlier, thanks for all the information, but it looks like this is up to ALE to decide what to do here.

dniku commented 5 years ago

According to Wikipedia, this is by design:

Once the second screen of bricks is destroyed, the ball in play harmlessly bounces off empty walls until the player restarts the game, as no additional screens are provided.

The score of 864 can be seen achieved here on a hardware Atari 2600.

This post also provides disassembly of the original game, showing the code which switches to the next level. It is basically if score == 432: refill_blocks().

dniku commented 5 years ago

Here is a super dirty proof of concept of how to make Breakout infinite: https://github.com/dniku/atari-py/commit/fc1dc149a796debd3198c6c96df4e839b4dbe2cc