[Bug Report] with rsl_rl, actor's std becomes "nan" during PPO training

mitsu3291 commented 2 months ago

I am conducting reinforcement learning for a robot using rsl_rl and isaac lab. While it works fine with simple settings, when I switch to more complex settings (such as Domain Randomization), the following error occurs during training（After some progress in training）, indicating that the actor's standard deviation does not meet the condition of being ≥ 0. Has anyone experienced a similar error? num_env is 3600

Traceback (most recent call last):
  File "/root/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 131, in <module>
    main()
  File "/root/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 123, in main
    runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True)
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/runners/on_policy_runner.py", line 153, in learn
    mean_value_loss, mean_surrogate_loss = self.alg.update()
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/algorithms/ppo.py", line 121, in update
    self.actor_critic.act(obs_batch, masks=masks_batch, hidden_states=hid_states_batch[0])
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/modules/actor_critic.py", line 105, in act

  File "/isaac-sim/exts/omni.isaac.ml_archive/pip_prebundle/torch/distributions/normal.py", line 74, in sample
    return torch.normal(self.loc.expand(shape), self.scale.expand(shape))  
RuntimeError: normal expects all elements of std >= 0.0

I investigated the value of std(self.scale) and found that the std value in a certain environment appears to be nan. (The number of columns represents the action dimensions for the robot.)

self.scale: tensor([[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
...,
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500]],
device='cuda:0')
env_id: 1111, row: tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       device='cuda:0')

weifeng-lt commented 1 month ago

https://github.com/leggedrobotics/rsl_rl/issues/33 https://github.com/leggedrobotics/rsl_rl/issues/7

ksiegall commented 1 month ago

I encountered this bug when implementing a custom reward function for my robot. It turned out that I was returning NaN as part of my reward term, which was causing this issue. Double check your rewards/other functions and ensure that they aren't outputting NaN

Lr-2002 commented 1 month ago

@ksiegall hello, I'm facing the similar problem with hand-craft reward function. While it seems not very easy to locate to problem ? I've found the env.scene[asset_cfg.name].data.joint_pos is all the nan. While I could not find more way to locate the reason?

The scene was modified based on the Open-Drawer-Frandka-v0, Could you give me some advice ?

ozhanozen commented 4 weeks ago

@ksiegall hello, I'm facing the similar problem with hand-craft reward function. While it seems not very easy to locate to problem ? I've found the env.scene[asset_cfg.name].data.joint_pos is all the nan. While I could not find more way to locate the reason?

The scene was modified based on the Open-Drawer-Frandka-v0, Could you give me some advice ?

@Lr-2002 , is there a way for you to confirm if that specific asset with the nan inside .data.joint_pos is correctly spawned to the scene? I have seen before that within one sample of my cloned environments, the robot was not successfully spawned (it was colliding with another asset), hence, I was getting similar problems.

Lr-2002 commented 4 weeks ago

how to make sure whether the robot/articulation is colliding with others asset? What's more I'm just create the env with one usd built from other platform , Any suggestion?

Lr-2002 commented 4 weeks ago

could we have a little meeting about the problem? I'm now on GMT+8 area

ozhanozen commented 4 weeks ago

how to make sure whether the robot/articulation is colliding with others asset? What's more I'm just create the env with one usd built from other platform , Any suggestion?

You can visualize the scene with a livestream option and see if the objects spawn and move correctly in all if you environments. In my case, one robot didn't spawn in a specific env index.

Maybe check if you can visualize the problem, otherwise you can write me a PM so we can arrange a short call. I am in GMT+2 though.