openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.78k stars 4.88k forks source link

Benchmarking for PPO and TRPO #61

Open miriaford opened 7 years ago

miriaford commented 7 years ago

Thanks to the OpenAI team for the latest release!

Are there any benchmark results (like Atari score) on PPO and TRPO? DQN has a report here: https://github.com/openai/baselines-results. It's super useful. Thanks again!

Twinko56X commented 7 years ago

I did not see any in the repo, but as a general indication PPO has a general benchmark at page 11 in the paper: https://openai-public.s3-us-west-2.amazonaws.com/blog/2017-07/ppo/ppo-arxiv.pdf#page=11

miriaford commented 7 years ago

@Twinko56X thanks for the link! It's actually on arxiv now: https://arxiv.org/pdf/1707.06347.pdf

I wonder if this repo is the same code used to produce those plots.

ViktorM commented 7 years ago

The DQN baselines results https://github.com/openai/baselines-results looks great, missed them. It would be nice to have at some point similar ipython notebook for the PPO vs TRPO vs DDPG vs IPG for continuous control problems and PPO vs DQN for Atari.

joschu commented 7 years ago

I'll add an ipython notebook with the atari an mujoco benchmarks soon.

doviettung96 commented 5 years ago

Hi @joschu , Currently, I try to replicate the result of the PPO paper on RoboschoolHumanoidFlagrunHarder-v1. Did you use the PPO algorithm in this Openai baselines? I have tried to modified it to include Adaptive learning rate based on KL divergence. Other hyperparameters are set the same as in the paper except the logstd of the action distribution to be zeros (not LinearAnneal(-0.7, -1.6). I have used the policy and value network as (512, 256, 128) and relu activation. However, I could not raise the mean episode reward to 2000. Is there any suggestion? Thanks.