pat-coady / trpo

Trust Region Policy Optimization with TensorFlow and OpenAI Gym
https://learningai.io/projects/2017/07/28/ai-gym-workout.html
MIT License
360 stars 106 forks source link

Can't work on CartPole-v1 #8

Closed AlexZhou1995 closed 7 years ago

AlexZhou1995 commented 7 years ago

Hi,

I only changed a few codes to make this project work on CartPole-v1 env, but the result was not good. The mean reward is always about 9.3 and can't goes up.

Do you have tested the performance on some easy environments?

Thank you

pat-coady commented 7 years ago

I think CartPole-v1 is a discrete control environment. This implementation is for continuous control tasks.

On testing, yes, I posted results from environments with 1-dim control to 17-dim control

Hope this helps.

On Sun, Oct 8, 2017 at 7:33 PM, Alex Z notifications@github.com wrote:

Hi,

I only changed a few codes to make this project work on CartPole-v1 env, but the result was not good. The mean reward is always about 9.3 and can't goes up.

Do you have tested the performance on some easy environments?

Thank you

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pat-coady/trpo/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/AWdFxB_VfwYcNEoJBiySrBvcmjc_-ocEks5sqXf5gaJpZM4Px7jO .