PPO2 sample efficiency - Githubissues

Hi, I was trying to get a sense of the sample complexity required for training PPO2 vs. DQN on simple environments. I ran both PPO2 and DQN on CartPole-v0. DQN achieved mean episode reward of 185 in about 100,000 episodes. For PPO2 it took about 2,000,000 episodes. This implies a sample complexity overhead of about 20x for PPO2. I did not change any hyper-parameters and used the default settings. Am I doing something wrong here or does PPO2 does indeed take around an order of magnitude more samples than DQN? Any tips/ideas appreciated. Thanks

openai / baselines

PPO2 sample efficiency #601