openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.66k stars 8.6k forks source link

bipedal_walker causes AssertionError #433

Closed 4SkyNet closed 7 years ago

4SkyNet commented 7 years ago

I get this assertion error in bipedal_walker environments sometimes: AssertionError: r.LengthSquared() > 0.0f

It's a bit strange for me, since I clipped the action before: np.clip(action, self.gym.action_space.low, self.gym.action_space.high)

Can you provide some additional details, which can explain this behaviour?

tlbtlbtlb commented 7 years ago

@olegklimov The assertion is from https://github.com/erincatto/Box2D/blob/374664b2a4ce2e7c24fbad6e1ed34bebcc9ab6bc/Box2D/Box2D/Collision/b2DynamicTree.h#L209 possibly called by this: https://github.com/openai/gym/blob/312c8b439c5d164c08e111c99ccd2ba8cf3a5857/gym/envs/box2d/bipedal_walker.py#L390

although I don't see how it could generate identical values for p1 and p2.

@4SkyNet a Python stack trace showing the values of the arguments to the RayCast would be very helpful.

4SkyNet commented 7 years ago

Thx for your reply > yep, you're right, my call stack is looks like as follows:

File ".../rl-server/clients/pkg/rl_client_gym/game_env.py", line 80, in act screen, reward, terminal, info = self.gym.step(action)

File ".../anaconda2/envs/tst/lib/python2.7/site-packages/gym/core.py", line 122, in step observation, reward, done, info = self._step(action)

File ".../anaconda2/envs/tst/lib/python2.7/site-packages/gym/envs/box2d/bipedal_walker.py", line 390, in _step self.world.RayCast(self.lidar[i], self.lidar[i].p1, self.lidar[i].p2)

AssertionError: r.LengthSquared() > 0.0f

As I said before > I clip any action to pass through: screen, reward, terminal, info = self.gym.step(action)

Since that it's a bit strange for me (but most of the time everything is ok)

tlbtlbtlb commented 7 years ago

Something is causing (self.lidar[i].p1 - self.lidar[i].p2).LengthSquared() > 0 to be false. If you can look at the values with the python debugger, it might be obvious why. If there's a NaN in there, it's due to a simulator numerical instability in the previous step.

olegklimov commented 7 years ago

+1 for NaN. I've worked a lot with the Walker, have never seen this. @4SkyNet do you have a way to reproduce?

4SkyNet commented 7 years ago

Sorry for the long delay --> I've ultimately reproduce this error on that machine, which I use for these experiments. Training passed around 12 mil states for 4 parallel clients and all of them died due this error.

I print out the clipped actions for each client before stacking in this assertion.

For the 1st client it looks like as follows:

('act', array([-1., -1., -1., -1.]))
('act', array([-1., -1., -1., -1.]))
('act', array([-1., -1., -1., -1.]))
('act', array([-1., -1., -1., -1.]))
('act', array([-1., -1., -1., -1.]))
('act', array([-1., -1., -1., -1.]))
('act', array([ nan, nan, nan, nan]))

2nd looks fine and I take only the last action:

('reset', array([ 0.81319702, 0.01847262, -0.28210664, 0.08144218]))
('act', array([ nan, nan, nan, nan]))

3rd also produce some inermediate warning before crashing:

('act', array([ 0.5295043, 1. , -1. , -1. ]))
('act', array([-1., 1., -1., -1.]))
('act', array([ 0.10607028, -1. , -1. , -/home/dvm/anaconda2/envs/tensorflow/lib/python2.7/site-packages/gym/envs/box2d/bipedal_walker.py:371: RuntimeWarning: invalid value encountered in absolute
self.joints[0].maxMotorTorque = float(MOTORS_TORQUE * np.clip(np.abs(action[0]), 0, 1))
/home/dvm/anaconda2/envs/tensorflow/lib/python2.7/site-packages/gym/envs/box2d/bipedal_walker.py:373: RuntimeWarning: invalid value encountered in absolute
self.joints[1].maxMotorTorque = float(MOTORS_TORQUE * np.clip(np.abs(action[1]), 0, 1))
/home/dvm/anaconda2/envs/tensorflow/lib/python2.7/site-packages/gym/envs/box2d/bipedal_walker.py:375: RuntimeWarning: invalid value encountered in absolute
self.joints[2].maxMotorTorque = float(MOTORS_TORQUE * np.clip(np.abs(action[2]), 0, 1))
/home/dvm/anaconda2/envs/tensorflow/lib/python2.7/site-packages/gym/envs/box2d/bipedal_walker.py:377: RuntimeWarning: invalid value encountered in absolute
self.joints[3].maxMotorTorque = float(MOTORS_TORQUE * np.clip(np.abs(action[3]), 0, 1))
1. ]))
('act', array([-1. , -0.47727501, -1. , -1. ]))
('act', array([ nan, nan, nan, nan]))

And the 4th also likes ones as 1st:

('act', array([-1. , -1. , -0.84609199, -1. ]))
('act', array([-1., -1., -1., -1.]))
('act', array([-1., -1., -1., -1.]))
('act', array([ nan, nan, nan, nan]))

And all of them die with the assertion error mentioned above after that:

Traceback (most recent call last):
...
File ".../my_gym_client/env.py", line 81, in act
obs, reward, terminal, info = self.gym.step(action)
File ".../gym/core.py", line 122, in step
observation, reward, done, info = self._step(action)
File ".../gym/envs/box2d/bipedal_walker.py", line 390, in _step
self.world.RayCast(self.lidar[i], self.lidar[i].p1, self.lidar[i].p2)
AssertionError: r.LengthSquared() > 0.0f

btw, I've also checked gym version on that machine. It a bit old to my mind: 0.5.6

tlbtlbtlb commented 7 years ago

That looks like your agent is generating the NaNs, and submitting them as the action. That's a problem with your agent, not the gym environment. Actions are required to be within the action space, which is [-1, -1, -1, -1] to [+1, +1, +1, +1].

However, we should fix the gym env to reject such actions immediately rather than propagating NaNs into the simulator. The only checks are like np.clip(action[0], -1, 1), which preserves NaNs. @olegklimov, can you add a check to raise an exception in BipedalWalker._step if action has NaNs?

4SkyNet commented 7 years ago

Ohh, yeah > it could be my fault > I must check (firstly) sigma output of my NN, cuz it uses softplus operator that could produce NaN (due some numerically instability, instead on naive 0+)

olegklimov commented 7 years ago

@tlbtlbtlb Let's close this, there's a lot of environments you can break if you pass NaNs as action?

4SkyNet commented 7 years ago

@olegklimov definitely! It's ridiculous error, but anyway - I don't have time to check it out for sure again in nearest future (and it seems pretty obvious as is).

4SkyNet commented 7 years ago

@tlbtlbtlb @olegklimov the problem is in softplus function - I've performed experiments one more time. Anyway, thx for help and care!

mtaohuang commented 4 years ago

@4SkyNet Hi, I am also experiencing this issue, sac actor produces nan action for BipedalWalkerHardcore-v3, any suggestions on which part of the code to investigate?