openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.67k stars 4.86k forks source link

PPO2 sample efficiency #601

Open manasgupta-1 opened 6 years ago

manasgupta-1 commented 6 years ago

Hi, I was trying to get a sense of the sample complexity required for training PPO2 vs. DQN on simple environments. I ran both PPO2 and DQN on CartPole-v0. DQN achieved mean episode reward of 185 in about 100,000 episodes. For PPO2 it took about 2,000,000 episodes. This implies a sample complexity overhead of about 20x for PPO2. I did not change any hyper-parameters and used the default settings. Am I doing something wrong here or does PPO2 does indeed take around an order of magnitude more samples than DQN? Any tips/ideas appreciated. Thanks

DanielTakeshi commented 5 years ago

CartPole is not a good benchmark for evaluating sample efficiency. It's too easy. Also how many random seeds did you run? Better to do at least 10 if possible.