Hi,
I was trying to get a sense of the sample complexity required for training PPO2 vs. DQN on simple environments. I ran both PPO2 and DQN on CartPole-v0. DQN achieved mean episode reward of 185 in about 100,000 episodes. For PPO2 it took about 2,000,000 episodes. This implies a sample complexity overhead of about 20x for PPO2. I did not change any hyper-parameters and used the default settings. Am I doing something wrong here or does PPO2 does indeed take around an order of magnitude more samples than DQN? Any tips/ideas appreciated. Thanks
CartPole is not a good benchmark for evaluating sample efficiency. It's too easy. Also how many random seeds did you run? Better to do at least 10 if possible.
Hi, I was trying to get a sense of the sample complexity required for training PPO2 vs. DQN on simple environments. I ran both PPO2 and DQN on CartPole-v0. DQN achieved mean episode reward of 185 in about 100,000 episodes. For PPO2 it took about 2,000,000 episodes. This implies a sample complexity overhead of about 20x for PPO2. I did not change any hyper-parameters and used the default settings. Am I doing something wrong here or does PPO2 does indeed take around an order of magnitude more samples than DQN? Any tips/ideas appreciated. Thanks