openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.8k stars 4.88k forks source link

PPO1 does not work on 'Pendulum-v0' #270

Open wonchul-kim opened 6 years ago

wonchul-kim commented 6 years ago

I ran ppo1 for Pendulum-v0

however, it does not work.... not converge...

Could someone have a solution?

yun-long commented 6 years ago

PPO-2 also did work on Pendulum-v0 problem. Maybe you can try to use smaller network for both the policy and value function. e.g., [16, 16] for hidden layer.