Closed zbenic closed 5 years ago
No, the policy seems to get stuck in a local maxima for the continuous env.
You could try to tune the hyperparameters (action_std
, K_epochs
, update_timestep
, lr
)
or use a different advantage function.
I tried changing the activations to Tanh and use the hyperparameters used by other repos, but the results were not very good either.
I'll update the repo if I find good parameters.
Hello.
Were you able to get >200 reward in Lunar Lander Continuous? I'm currenty at ~40000 episode, but still the reward is max ~130.
I have no problems with discrete env, but do with continuous. Can you give me some advice?