Open wonchul-kim opened 6 years ago
I ran ppo1 for Pendulum-v0
however, it does not work.... not converge...
Could someone have a solution?
PPO-2 also did work on Pendulum-v0 problem. Maybe you can try to use smaller network for both the policy and value function. e.g., [16, 16] for hidden layer.
I ran ppo1 for Pendulum-v0
however, it does not work.... not converge...
Could someone have a solution?