nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.63k stars 340 forks source link

PPO for continuous env #4

Closed zbenic closed 5 years ago

zbenic commented 5 years ago

Hello.

Were you able to get >200 reward in Lunar Lander Continuous? I'm currenty at ~40000 episode, but still the reward is max ~130.

I have no problems with discrete env, but do with continuous. Can you give me some advice?

nikhilbarhate99 commented 5 years ago

No, the policy seems to get stuck in a local maxima for the continuous env. You could try to tune the hyperparameters (action_std, K_epochs, update_timestep, lr) or use a different advantage function.

I tried changing the activations to Tanh and use the hyperparameters used by other repos, but the results were not very good either.

I'll update the repo if I find good parameters.