Closed llecam closed 6 years ago
@llecam -- does my solution in this issue help?
Can you try Master branch and let me know if it is still a problem?
Thanks, Pat
On Jan 9, 2018, at 10:50 AM, llecam notifications@github.com wrote:
Hello Pat,
Thanks for your great repo !
I read some of the closed issues but didn't find the same I'm encountering.
I tried to launch your train.py code on RoboschoolInvertedPendulum-v1 which is supposed to be the same as the MuJoCo environment but I got the following error:
File "/home/lea/roboschool/roboschool/gym_pendulums.py", line 88, in calc_state assert( np.isfinite(x) ) AssertionError
The error occurs on line 108: obs, reward, done, _ = env.step(np.squeeze(action, axis=0)). This should be an error with the dimensions but I can't figure out what is exactly the problem.
I tried to workaround by using the corresponding pybullet environment InvertedPendulumBullet-v0. The program is running but the pendulum doesn't seem to learn a thing. The reward stays on average around 30.0 and doesn't make any progress. Do you have an idea why ?
I'm looking forward for your reply !
Léa
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pat-coady/trpo/issues/14, or mute the thread https://github.com/notifications/unsubscribe-auth/AWdFxN1tKphe3w_BG0MqJrT13h5Gpn50ks5tI4rggaJpZM4RYCI8.
Hi,
@pender Thanks but this issue didn't really help I don't exactly have the same problem @pat-coady I tried both master branch and aygym-evaluation branch and I got the same results. For the pybullet environment it doesn't seem to learn.
Thanks, Léa
Hi,
I had a very similar issue, so I finally downloaded Mujoco to try understand where the problem comes from. I have the same issue with mujoco Inverted Pendulum as Lea has with pybullet. Everything seem to be working fine but the reward stays around 10 even after 5000 iterations. I tried on both branches. You will find my log folder attached. I am using Python2.7, do you think it could be a problem ?
Thanks, Louis
Sorry I haven't had time to look at this yet.
Can you confirm it is a continuous control environment? In other words, the environment is expecting a vector of real numbers (and not ints or bools).
Some of the gym environments are not continuous control.
@llecam - can you post code the reproduces problem using roboschool and inverted pendulum. I'll do my best to run tonight or tomorrow.
Thank you for your time. I only used continuous environments.
The only modifications I brought to your code is adding different "import" in train.py to make it work with pybullet and/or roboschool:
import pybullet as p import pybullet_envs import roboschool
With roboschool version of inverted pendulum environment "RoboschoolInvertedPendulum-v1", I got the following when I launch train.py with default parameters:
Traceback (most recent call last): File "train.py", line 334, in <module> main(**vars(args)) File "train.py", line 287, in main run_policy(env, policy, scaler, logger, episodes=5) File "train.py", line 135, in run_policy observes, actions, rewards, unscaled_obs = run_episode(env, policy, scaler) File "train.py", line 105, in run_episode obs, reward, done, _ = env.step(np.squeeze(action, axis=0)) File "/home/lea/gym/gym/core.py", line 96, in step return self._step(action) File "/home/lea/gym/gym/wrappers/monitoring.py", line 32, in _step observation, reward, done, info = self.env.step(action) File "/home/lea/gym/gym/core.py", line 96, in step return self._step(action) File "/home/lea/gym/gym/wrappers/time_limit.py", line 36, in _step observation, reward, done, info = self.env.step(action) File "/home/lea/gym/gym/core.py", line 96, in step return self._step(action) File "/home/lea/roboschool/roboschool/gym_pendulums.py", line 97, in _step state = self.calc_state() # sets self.pos_x self.pos_y File "/home/lea/roboschool/roboschool/gym_pendulums.py", line 88, in calc_state assert( np.isfinite(x) ) AssertionError
With pybullet version of inverted pendulum "InvertedPendulumBulletEnv-v0", I don't get any error but I feel like it starts again at each epoch without learning. I got the following results in my terminal with parameters by default: ` Episode 20, Mean R = 28.1 Beta: 0.667 ExplainedVarNew: 1.11e-16 ExplainedVarOld: 0 KL: 3.12e-05 PolicyEntropy: 0.923 PolicyLoss: -0.000577 Steps: 562 ValFuncLoss: 0.0041
[2018-01-15 15:26:40,022] Starting new video recorder writing to /tmp/InvertedPendulumBulletEnv-v0/Jan-15_14:26:37/openaigym.video.0.3580.video000027.mp4 Episode 40, Mean R = 24.9 Beta: 0.444 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 5.36e-07 PolicyEntropy: 0.922 PolicyLoss: -6.3e-05 Steps: 497 ValFuncLoss: 0.0042
[2018-01-15 15:26:40,967] Starting new video recorder writing to /tmp/InvertedPendulumBulletEnv-v0/Jan-15_14:26:37/openaigym.video.0.3580.video000064.mp4 Episode 60, Mean R = 22.9 Beta: 0.296 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 1.16e-05 PolicyEntropy: 0.92 PolicyLoss: -0.000143 Steps: 458 ValFuncLoss: 0.00167
Episode 80, Mean R = 20.2 Beta: 0.198 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 8.88e-06 PolicyEntropy: 0.92 PolicyLoss: -8.1e-05 Steps: 405 ValFuncLoss: 0.00259
Episode 100, Mean R = 24.4 Beta: 0.132 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 1.17e-05 PolicyEntropy: 0.919 PolicyLoss: -0.000282 Steps: 487 ValFuncLoss: 0.00278
Episode 120, Mean R = 25.1 Beta: 0.0878 ExplainedVarNew: 1.11e-16 ExplainedVarOld: 1.11e-16 KL: 8.52e-06 PolicyEntropy: 0.917 PolicyLoss: -0.000111 Steps: 503 ValFuncLoss: 0.00175
[2018-01-15 15:26:42,787] Starting new video recorder writing to /tmp/InvertedPendulumBulletEnv-v0/Jan-15_14:26:37/openaigym.video.0.3580.video000125.mp4 Episode 140, Mean R = 25.8 Beta: 0.0585 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 2.54e-05 PolicyEntropy: 0.915 PolicyLoss: -0.000493 Steps: 516 ValFuncLoss: 0.00333
Episode 160, Mean R = 26.2 Beta: 0.039 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 1.37e-06 PolicyEntropy: 0.914 PolicyLoss: 3.45e-05 Steps: 525 ValFuncLoss: 0.00406
Episode 180, Mean R = 26.4 Beta: 0.026 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 2.13e-05 PolicyEntropy: 0.916 PolicyLoss: -0.000416 Steps: 528 ValFuncLoss: 0.00318
Episode 200, Mean R = 31.1 Beta: 0.0173 ExplainedVarNew: 0 ExplainedVarOld: 0 KL: 8.34e-07 PolicyEntropy: 0.916 PolicyLoss: 1.93e-05 Steps: 621 ValFuncLoss: 0.00797 `
I only made a copy of the first 200 episodes because the results are similar afterwards as well.
I'm running in on Python 2.7. Could it be a problem ?
I'm actually surprised it works at all in 2.7. The code was written for 3.x - can you try this?
On Jan 16, 2018, at 5:51 AM, llecam notifications@github.com wrote:
I'm running in on Python 2.7. Could it be a problem ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pat-coady/trpo/issues/14#issuecomment-357923739, or mute the thread https://github.com/notifications/unsubscribe-auth/AWdFxKVJxuox1rtw0PRi_T8QmO5gM1ISks5tLH88gaJpZM4RYCI8.
I had indeed a problem because I was using Python 2.7. Thank you ! It's working with Python 3.
Great news, glad to hear it.
On Jan 16, 2018, at 8:31 AM, llecam notifications@github.com wrote:
I had indeed a problem because I was using Python 2.7. Thank you ! It's working with Python 3.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pat-coady/trpo/issues/14#issuecomment-357960585, or mute the thread https://github.com/notifications/unsubscribe-auth/AWdFxI1IPWukwEG-TmNXgp5a4tqny5kEks5tLKTNgaJpZM4RYCI8.
Hello Pat,
Thanks for your great repo !
I read some of the closed issues but didn't find the same I'm encountering.
I tried to launch your train.py code on RoboschoolInvertedPendulum-v1 which is supposed to be the same as the MuJoCo environment but I got the following error:
File "/home/lea/roboschool/roboschool/gym_pendulums.py", line 88, in calc_state assert( np.isfinite(x) ) AssertionError
The error occurs on line 108:
obs, reward, done, _ = env.step(np.squeeze(action, axis=0))
. This should be an error with the dimensions but I can't figure out what is exactly the problem.I tried to workaround by using the corresponding pybullet environment InvertedPendulumBullet-v0. The program is running but the pendulum doesn't seem to learn a thing. The reward stays on average around 30.0 and doesn't make any progress. Do you have an idea why ?
I'm looking forward for your reply !
Léa