Closed David-Clement-Senbionic closed 6 years ago
David,
Without looking at this in more detail, my first suggestion would be to reduce learning rate on policy by 10x and see if it helps.
Sorry I haven't been able to look more carefully.
Pat
On Apr 18, 2018, at 9:37 PM, David Clement notifications@github.com wrote:
Hi there, I have created a variant of the HumanStandup-v2 environment in gym which has a much simpler simulated robot that is represented as a mujoco formatted xml file. I have tested this model both in mujoco and in gym and it seems to work fine. I tested the HumanStandup-v2 training on my hw/sw configuration and it worked well to 50,000 episodes. I then ran the identical setup with the same reward function as the standard HumanStandup-v2. The only substantive difference between these two is the mujoco model. When I rand the training on our model I get:
Episode 31455, Mean R = 28911.0 Beta: 6.91 ExplainedVarNew: 0.913 ExplainedVarOld: 0.812 KL: nan PolicyEntropy: nan PolicyLoss: nan Steps: 672 ValFuncLoss: 114
Traceback (most recent call last): File "./train.py", line 334, in main(**vars(args)) File "./train.py", line 290, in main trajectories = run_policy(env, policy, scaler, logger, episodes=batch_size) File "./train.py", line 135, in run_policy observes, actions, rewards, unscaled_obs = run_episode(env, policy, scaler) File "./train.py", line 105, in runepisode obs, reward, done, = env.step(np.squeeze(action, axis=0)) File "/home/david/source/gym/gym/wrappers/monitor.py", line 31, in step observation, reward, done, info = self.env.step(action) File "/home/david/source/gym/gym/wrappers/time_limit.py", line 31, in step observation, reward, done, info = self.env.step(action) File "/home/david/source/gym/gym/envs/Senbionic/ballbotEnv.py", line 28, in step self.do_simulation(a, self.frame_skip) File "/home/david/source/gym/gym/envs/mujoco/mujoco_env.py", line 100, in do_simulation self.sim.step() File "source/mujoco-py/mujoco_py/mjsim.pyx", line 119, in mujoco_py.cymj.MjSim.step File "source/mujoco-py/mujoco_py/cymj.pyx", line 115, in mujoco_py.cymj.wrap_mujoco_warning.exit File "source/mujoco-py/mujoco_py/cymj.pyx", line 75, in mujoco_py.cymj.c_warning_callback File "/home/david/.conda/envs/gym35/lib/python3.5/site-packages/mujoco_py-1.50.1.53-py3.5.egg/mujoco_py/builder.py", line 319, in user_warning_raise_exception raise MujocoException('Got MuJoCo Warning: {}'.format(warn)) mujoco_py.builder.MujocoException: Got MuJoCo Warning: Unknown warning type Time = 0.0000.
Any suggestions on how to approach overcoming this?
Many thanks for any advice..
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pat-coady/trpo/issues/21, or mute the thread https://github.com/notifications/unsubscribe-auth/AWdFxIuERGX7q3jkIuJRYUy4CbZMUSKUks5tp-pfgaJpZM4TbBCP.
It seemed to go away by reconfiguring the mujoco model parameters. I believe it was just mujoco hitting an "exploding" result causing a cascade effect.
Were you able to get your humanoid to stand up? If so, would love to see a video.
On Apr 26, 2018, at 1:24 PM, David Clement notifications@github.com wrote:
Closed #21 https://github.com/pat-coady/trpo/issues/21.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pat-coady/trpo/issues/21#event-1597083627, or mute the thread https://github.com/notifications/unsubscribe-auth/AWdFxOnlwqmtBmw8A3HUL-yGQoVhOg0Mks5tsgLCgaJpZM4TbBCP.
Hi Patrick, I only ran 50,000 episodes but it seemed to be working well.
Cool stuff 😎
David
Sent from my iPhone
On Apr 27, 2018, at 4:18 AM, Patrick Coady notifications@github.com wrote:
Were you able to get your humanoid to stand up? If so, would love to see a video.
On Apr 26, 2018, at 1:24 PM, David Clement notifications@github.com wrote:
Closed #21 https://github.com/pat-coady/trpo/issues/21.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pat-coady/trpo/issues/21#event-1597083627, or mute the thread https://github.com/notifications/unsubscribe-auth/AWdFxOnlwqmtBmw8A3HUL-yGQoVhOg0Mks5tsgLCgaJpZM4TbBCP.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.
It seemed to go away by reconfiguring the mujoco model parameters. I believe it was just mujoco hitting an "exploding" result causing a cascade effect.
@David-Clement-Senbionic Hi David, I also have the same problem, I create a new Mujoco humanoid model with human-like parameters but at some point my system is exploding like yours. How did you manage to reconfigure your mujoco model parameters? Also if it is possible could you share the code working? Did you change your reward function for standing up?
@David-Clement-Senbionic my mistake, the code is already shared :D but mujoco optimization is still an issue for me. Any help would be appreciated :)
Hi there, I have created a variant of the HumanStandup-v2 environment in gym which has a much simpler simulated robot that is represented as a mujoco formatted xml file. I have tested this model both in mujoco and in gym and it seems to work fine. I tested the HumanStandup-v2 training on my hw/sw configuration and it worked well to 50,000 episodes. I then ran the identical setup with our robot model with the same reward function as the standard HumanStandup-v2. The only substantive difference between these two is the mujoco model. When I ran the training on our model I get:
Episode 31455, Mean R = 28911.0 Beta: 6.91 ExplainedVarNew: 0.913 ExplainedVarOld: 0.812 KL: nan PolicyEntropy: nan PolicyLoss: nan Steps: 672 ValFuncLoss: 114
Traceback (most recent call last): File "./train.py", line 334, in
main(**vars(args))
File "./train.py", line 290, in main
trajectories = run_policy(env, policy, scaler, logger, episodes=batch_size)
File "./train.py", line 135, in run_policy
observes, actions, rewards, unscaled_obs = run_episode(env, policy, scaler)
File "./train.py", line 105, in runepisode
obs, reward, done, = env.step(np.squeeze(action, axis=0))
File "/home/david/source/gym/gym/wrappers/monitor.py", line 31, in step
observation, reward, done, info = self.env.step(action)
File "/home/david/source/gym/gym/wrappers/time_limit.py", line 31, in step
observation, reward, done, info = self.env.step(action)
File "/home/david/source/gym/gym/envs/Senbionic/ballbotEnv.py", line 28, in step
self.do_simulation(a, self.frame_skip)
File "/home/david/source/gym/gym/envs/mujoco/mujoco_env.py", line 100, in do_simulation
self.sim.step()
File "source/mujoco-py/mujoco_py/mjsim.pyx", line 119, in mujoco_py.cymj.MjSim.step
File "source/mujoco-py/mujoco_py/cymj.pyx", line 115, in mujoco_py.cymj.wrap_mujoco_warning.exit
File "source/mujoco-py/mujoco_py/cymj.pyx", line 75, in mujoco_py.cymj.c_warning_callback
File "/home/david/.conda/envs/gym35/lib/python3.5/site-packages/mujoco_py-1.50.1.53-py3.5.egg/mujoco_py/builder.py", line 319, in user_warning_raise_exception
raise MujocoException('Got MuJoCo Warning: {}'.format(warn))
mujoco_py.builder.MujocoException: Got MuJoCo Warning: Unknown warning type Time = 0.0000.
I ran it again and it did the same thing at Episode 1280.
Any suggestions on how to approach overcoming this?
Many thanks for any advice..