Open Ashboy64 opened 5 years ago
ah... the culprit is WarpFrame environment wrapper that's added to atari environments - it tries to rescale the observation assuming it is an image; which fails for -ram observations. For now, I'd recommend disabling WarpFrame wrapper here:
https://github.com/openai/baselines/blob/f3a5abaeeb1c1c9136a01c9dbfebc173dc311fef/baselines/common/atari_wrappers.py#L241
I'll keep the issue open until it is patched.
Another thing that may be worth considering is that default mlp network (your --network=mlp
argument won't work very well with byte observations for two reasons:
1) observation range is between 0 and 255 instead of standard ~0..1 or -1..1 -> this will likely saturate tanh nonlinearities. This is easy by adding normalization wrapper, however:
2) the small changes in ram values may correspond to large changes in state. For instance, life counter increasing by 1 vs life counter decreasing by 1 :) . One way out is to one-hot encode the RAM (this will lead to giant observations though); some code modifications will be needed for that too (for instance, in baselines/common/input.py)
I'm trying to run PPO2 on the 'Breakout-ram-v0' environment, but am getting a few errors. The command I am using to run the program is:
python -m baselines.run --alg=ppo2 --env=Breakout-ramNoFrameskip-v0 --num_timesteps=1e7 --network=mlp
Which gives the following error:
If I use this command instead:
python -m baselines.run --alg=ppo2 --env=Breakout-ram-v0 --num_timesteps=1e7 --network=mlp
I get an assertion error:
How can I run this program on the necessary environment?
Thanks in advance for any help.