pat-coady / trpo

Trust Region Policy Optimization with TensorFlow and OpenAI Gym
https://learningai.io/projects/2017/07/28/ai-gym-workout.html
MIT License
360 stars 106 forks source link

Trouble using pybullet and roboschool envs #14

Closed llecam closed 6 years ago

llecam commented 6 years ago

Hello Pat,

Thanks for your great repo !

I read some of the closed issues but didn't find the same I'm encountering.

I tried to launch your train.py code on RoboschoolInvertedPendulum-v1 which is supposed to be the same as the MuJoCo environment but I got the following error:

File "/home/lea/roboschool/roboschool/gym_pendulums.py", line 88, in calc_state assert( np.isfinite(x) ) AssertionError

The error occurs on line 108: obs, reward, done, _ = env.step(np.squeeze(action, axis=0)). This should be an error with the dimensions but I can't figure out what is exactly the problem.

I tried to workaround by using the corresponding pybullet environment InvertedPendulumBullet-v0. The program is running but the pendulum doesn't seem to learn a thing. The reward stays on average around 30.0 and doesn't make any progress. Do you have an idea why ?

I'm looking forward for your reply !

Léa

pender commented 6 years ago

@llecam -- does my solution in this issue help?

pat-coady commented 6 years ago

Can you try Master branch and let me know if it is still a problem?

Thanks, Pat

On Jan 9, 2018, at 10:50 AM, llecam notifications@github.com wrote:

Hello Pat,

Thanks for your great repo !

I read some of the closed issues but didn't find the same I'm encountering.

I tried to launch your train.py code on RoboschoolInvertedPendulum-v1 which is supposed to be the same as the MuJoCo environment but I got the following error:

File "/home/lea/roboschool/roboschool/gym_pendulums.py", line 88, in calc_state assert( np.isfinite(x) ) AssertionError

The error occurs on line 108: obs, reward, done, _ = env.step(np.squeeze(action, axis=0)). This should be an error with the dimensions but I can't figure out what is exactly the problem.

I tried to workaround by using the corresponding pybullet environment InvertedPendulumBullet-v0. The program is running but the pendulum doesn't seem to learn a thing. The reward stays on average around 30.0 and doesn't make any progress. Do you have an idea why ?

I'm looking forward for your reply !

Léa

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pat-coady/trpo/issues/14, or mute the thread https://github.com/notifications/unsubscribe-auth/AWdFxN1tKphe3w_BG0MqJrT13h5Gpn50ks5tI4rggaJpZM4RYCI8.

llecam commented 6 years ago

Hi,

@pender Thanks but this issue didn't really help I don't exactly have the same problem @pat-coady I tried both master branch and aygym-evaluation branch and I got the same results. For the pybullet environment it doesn't seem to learn.

Thanks, Léa

louisdeschamps44 commented 6 years ago

Hi,

I had a very similar issue, so I finally downloaded Mujoco to try understand where the problem comes from. I have the same issue with mujoco Inverted Pendulum as Lea has with pybullet. Everything seem to be working fine but the reward stays around 10 even after 5000 iterations. I tried on both branches. You will find my log folder attached. I am using Python2.7, do you think it could be a problem ?

Jan-10_09:29:48.zip

Thanks, Louis

pat-coady commented 6 years ago

Sorry I haven't had time to look at this yet.

Can you confirm it is a continuous control environment? In other words, the environment is expecting a vector of real numbers (and not ints or bools).

Some of the gym environments are not continuous control.

pat-coady commented 6 years ago

@llecam - can you post code the reproduces problem using roboschool and inverted pendulum. I'll do my best to run tonight or tomorrow.

llecam commented 6 years ago

Thank you for your time. I only used continuous environments.

The only modifications I brought to your code is adding different "import" in train.py to make it work with pybullet and/or roboschool: import pybullet as p import pybullet_envs import roboschool

With roboschool version of inverted pendulum environment "RoboschoolInvertedPendulum-v1", I got the following when I launch train.py with default parameters: Traceback (most recent call last): File "train.py", line 334, in <module> main(**vars(args)) File "train.py", line 287, in main run_policy(env, policy, scaler, logger, episodes=5) File "train.py", line 135, in run_policy observes, actions, rewards, unscaled_obs = run_episode(env, policy, scaler) File "train.py", line 105, in run_episode obs, reward, done, _ = env.step(np.squeeze(action, axis=0)) File "/home/lea/gym/gym/core.py", line 96, in step return self._step(action) File "/home/lea/gym/gym/wrappers/monitoring.py", line 32, in _step observation, reward, done, info = self.env.step(action) File "/home/lea/gym/gym/core.py", line 96, in step return self._step(action) File "/home/lea/gym/gym/wrappers/time_limit.py", line 36, in _step observation, reward, done, info = self.env.step(action) File "/home/lea/gym/gym/core.py", line 96, in step return self._step(action) File "/home/lea/roboschool/roboschool/gym_pendulums.py", line 97, in _step state = self.calc_state() # sets self.pos_x self.pos_y File "/home/lea/roboschool/roboschool/gym_pendulums.py", line 88, in calc_state assert( np.isfinite(x) ) AssertionError

With pybullet version of inverted pendulum "InvertedPendulumBulletEnv-v0", I don't get any error but I feel like it starts again at each epoch without learning. I got the following results in my terminal with parameters by default: ` Episode 20, Mean R = 28.1 Beta: 0.667 ExplainedVarNew: 1.11e-16 ExplainedVarOld: 0 KL: 3.12e-05 PolicyEntropy: 0.923 PolicyLoss: -0.000577 Steps: 562 ValFuncLoss: 0.0041

[2018-01-15 15:26:40,022] Starting new video recorder writing to /tmp/InvertedPendulumBulletEnv-v0/Jan-15_14:26:37/openaigym.video.0.3580.video000027.mp4 Episode 40, Mean R = 24.9 Beta: 0.444 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 5.36e-07 PolicyEntropy: 0.922 PolicyLoss: -6.3e-05 Steps: 497 ValFuncLoss: 0.0042

[2018-01-15 15:26:40,967] Starting new video recorder writing to /tmp/InvertedPendulumBulletEnv-v0/Jan-15_14:26:37/openaigym.video.0.3580.video000064.mp4 Episode 60, Mean R = 22.9 Beta: 0.296 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 1.16e-05 PolicyEntropy: 0.92 PolicyLoss: -0.000143 Steps: 458 ValFuncLoss: 0.00167

Episode 80, Mean R = 20.2 Beta: 0.198 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 8.88e-06 PolicyEntropy: 0.92 PolicyLoss: -8.1e-05 Steps: 405 ValFuncLoss: 0.00259

Episode 100, Mean R = 24.4 Beta: 0.132 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 1.17e-05 PolicyEntropy: 0.919 PolicyLoss: -0.000282 Steps: 487 ValFuncLoss: 0.00278

Episode 120, Mean R = 25.1 Beta: 0.0878 ExplainedVarNew: 1.11e-16 ExplainedVarOld: 1.11e-16 KL: 8.52e-06 PolicyEntropy: 0.917 PolicyLoss: -0.000111 Steps: 503 ValFuncLoss: 0.00175

[2018-01-15 15:26:42,787] Starting new video recorder writing to /tmp/InvertedPendulumBulletEnv-v0/Jan-15_14:26:37/openaigym.video.0.3580.video000125.mp4 Episode 140, Mean R = 25.8 Beta: 0.0585 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 2.54e-05 PolicyEntropy: 0.915 PolicyLoss: -0.000493 Steps: 516 ValFuncLoss: 0.00333

Episode 160, Mean R = 26.2 Beta: 0.039 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 1.37e-06 PolicyEntropy: 0.914 PolicyLoss: 3.45e-05 Steps: 525 ValFuncLoss: 0.00406

Episode 180, Mean R = 26.4 Beta: 0.026 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 2.13e-05 PolicyEntropy: 0.916 PolicyLoss: -0.000416 Steps: 528 ValFuncLoss: 0.00318

Episode 200, Mean R = 31.1 Beta: 0.0173 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 8.34e-07 PolicyEntropy: 0.916 PolicyLoss: 1.93e-05 Steps: 621 ValFuncLoss: 0.00797 `

I only made a copy of the first 200 episodes because the results are similar afterwards as well.

llecam commented 6 years ago

I'm running in on Python 2.7. Could it be a problem ?

pat-coady commented 6 years ago

I'm actually surprised it works at all in 2.7. The code was written for 3.x - can you try this?

On Jan 16, 2018, at 5:51 AM, llecam notifications@github.com wrote:

I'm running in on Python 2.7. Could it be a problem ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pat-coady/trpo/issues/14#issuecomment-357923739, or mute the thread https://github.com/notifications/unsubscribe-auth/AWdFxKVJxuox1rtw0PRi_T8QmO5gM1ISks5tLH88gaJpZM4RYCI8.

llecam commented 6 years ago

I had indeed a problem because I was using Python 2.7. Thank you ! It's working with Python 3.

pat-coady commented 6 years ago

Great news, glad to hear it.

On Jan 16, 2018, at 8:31 AM, llecam notifications@github.com wrote:

I had indeed a problem because I was using Python 2.7. Thank you ! It's working with Python 3.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pat-coady/trpo/issues/14#issuecomment-357960585, or mute the thread https://github.com/notifications/unsubscribe-auth/AWdFxI1IPWukwEG-TmNXgp5a4tqny5kEks5tLKTNgaJpZM4RYCI8.