openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.62k stars 4.86k forks source link

Can anyone report successful parameters for A2C playing Pong or another Atari game? #853

Open slerman12 opened 5 years ago

slerman12 commented 5 years ago

I'm trying to run A2C, but the default parameters, including nsteps=5, aren't letting me train Pong. What parameters to the model can I pass that would let me play a simple Atari game like Pong?

DanielTakeshi commented 5 years ago

@slerman12 Show the output of your training log, the command you ran, and how long you trained for. Hard to say without seeing these details.

slerman12 commented 5 years ago

I used this tutorial code based on Baselines: https://github.com/simoninithomas/Deep_reinforcement_learning_Course/tree/master/A2C%20with%20Sonic%20the%20Hedgehog

agent.py runs the code. I just changed the sonic environments to 8 copies of Pong. His learn method in model.py also divided up the training update into mini batches since his Runner was collecting a lot more data for Sonic than could be processed all at once; I changed that back to 1 minibatch since these 8 Pong environments don't demand quite so much data. I also brought his nsteps down to different values, including 5 which is the default in the Baselines code here. No matter what, the model did not seem able to learn to play Pong. The explained variance went up, but the average reward returned per training episode continued to be negative after two days of training.

araffin commented 5 years ago

Hello, If you are looking for training script + working hyperparameters, I recommend you taking a look at the rl zoo (there are also pre-trained agent there).

Disclaimer: this is using stable-baselines, a fork of OpenAI Baselines, but the underlying implementation is the same.