hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

ACER performance on Breakout #103

Open araffin opened 5 years ago

araffin commented 5 years ago

Although ACER seems to give good performance on most atari games, it still fails on Breakout. Code: https://github.com/araffin/rl-baselines-zoo

python train.py --algo acer --env BreakoutNoFrameskip-v4

We should double check the implementation.

Migdalin commented 5 years ago

I experienced this same problem with PPO-1. I ran with verbose=1, and tensorboard logging on, but all other parameters left default. After 1.3M frames or so, the episode reward for Breakout was still pegged around 2.0: effectively what a random agent would produce.

araffin commented 5 years ago

@Migdalin Could you try again with the fix from #388 ? Anyway, I would recommend you to use PPO2 (that have additional tricks compared to PPO1).

araffin commented 5 years ago

Now merged with master