openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.64k stars 4.86k forks source link

ACER impl. broken? #747

Open bsdooby opened 5 years ago

bsdooby commented 5 years ago

When I train the ACER impl., as given in the ACER section (python -m baselines.run --alg=acer --env=PongNoFrameskip-v4 --num_timesteps=10e6 --save_path=/tmp/acer-models/pong_10M_acer), then the trained model, loaded for visualization (python -m baselines.run --alg=acer --env=PongNoFrameskip-v4 --num_timesteps=0 --play --load_path=/tmp/acer-models/pong_10M_acer), does not score at all. Are there any parameters that need to be provided (apart from the default ones)? Or is ACER unsuited for pong, or even buggy?

pzhokhov commented 5 years ago

Hi @bsdooby! ACER should train to a perfect score on pong within 10M timesteps - so in principle ACER is suited for Pong. There may be a recently introduced bug that prevents it from training, or the problem is in the visualization code. What is the mean episode reward reported by ACER by the end of the training? If it is around ~-20, then the problem is in training; if ~20, the problem is either in reloading the model or visualization code.