Open bsdooby opened 5 years ago
Hi @bsdooby! ACER should train to a perfect score on pong within 10M timesteps - so in principle ACER is suited for Pong. There may be a recently introduced bug that prevents it from training, or the problem is in the visualization code. What is the mean episode reward reported by ACER by the end of the training? If it is around ~-20, then the problem is in training; if ~20, the problem is either in reloading the model or visualization code.
When I train the ACER impl., as given in the ACER section (
python -m baselines.run --alg=acer --env=PongNoFrameskip-v4 --num_timesteps=10e6 --save_path=/tmp/acer-models/pong_10M_acer
), then the trained model, loaded for visualization (python -m baselines.run --alg=acer --env=PongNoFrameskip-v4 --num_timesteps=0 --play --load_path=/tmp/acer-models/pong_10M_acer
), does not score at all. Are there any parameters that need to be provided (apart from the default ones)? Or is ACER unsuited for pong, or even buggy?