Open araffin opened 6 years ago
I experienced this same problem with PPO-1. I ran with verbose=1, and tensorboard logging on, but all other parameters left default. After 1.3M frames or so, the episode reward for Breakout was still pegged around 2.0: effectively what a random agent would produce.
@Migdalin Could you try again with the fix from #388 ? Anyway, I would recommend you to use PPO2 (that have additional tricks compared to PPO1).
Now merged with master
Although ACER seems to give good performance on most atari games, it still fails on Breakout. Code: https://github.com/araffin/rl-baselines-zoo
We should double check the implementation.