[Bug Report] actor's std becomes "nan" during PPO training

mitsu3291 commented 1 month ago

I am conducting reinforcement learning for a robot using rsl_rl and isaac lab. While it works fine with simple settings, when I switch to more complex settings (such as Domain Randomization), the following error occurs during training（After some progress in training）, indicating that the actor's standard deviation does not meet the condition of being ≥ 0. Has anyone experienced a similar error? num_env is 3600

Traceback (most recent call last):
  File "/root/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 131, in <module>
    main()
  File "/root/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 123, in main
    runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True)
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/runners/on_policy_runner.py", line 153, in learn
    mean_value_loss, mean_surrogate_loss = self.alg.update()
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/algorithms/ppo.py", line 121, in update
    self.actor_critic.act(obs_batch, masks=masks_batch, hidden_states=hid_states_batch[0])
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/modules/actor_critic.py", line 105, in act

  File "/isaac-sim/exts/omni.isaac.ml_archive/pip_prebundle/torch/distributions/normal.py", line 74, in sample
    return torch.normal(self.loc.expand(shape), self.scale.expand(shape))  
RuntimeError: normal expects all elements of std >= 0.0

I investigated the value of std(self.scale) and found that the std value in a certain environment appears to be nan. (The number of columns represents the action dimensions for the robot.)

self.scale: tensor([[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
...,
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500]],
device='cuda:0')
env_id: 1111, row: tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       device='cuda:0')

Lruomeng commented 1 month ago

me too

shafeef901 commented 1 month ago

Can confirm I've experienced it too. In my case, I had introduced some sparse rewards to my environment. Not sure that's the cause tho.

felipemohr commented 1 month ago

Same problem here. When visualizing the training data in tensorboard, I notice that Loss/value_function suddenly goes to infinity

xliu0105 commented 1 month ago

Same problem

xliu0105 commented 1 month ago

When facing the error std>=0, check the output 'Value Function Loss' to see whether it's inf or not. If it is inf, there is a solution that you can try. Based on the knowledge from issues ray-project/ray#19291 with the fix ray-project/ray#22171 and ray-project/ray@ddd1160, the codes starting at L159 in ppo.py file of rsl_rl (version 2.0.2) need to be modified as follows:

When facing the error std>=0, check the output 'Value Function Loss' to see whether it's inf or not. If it is inf, there is a solution that you can try. Based on the knowledge from issues ray-project/ray#19291 with the fix ray-project/ray#22171 and ray-project/ray@ddd1160, the codes starting at L159 in ppo.py file of rsl_rl (version 2.0.2) need to be modified as follows:

Thanks for your answer, I'll try it out

AlexanderAbernathy commented 1 month ago

When facing the error std>=0, check the output 'Value Function Loss' to see whether it's inf or not. If it is inf, there is a solution that you can try. Based on the knowledge from issues https://github.com/ray-project/ray/issues/19291 with the fix https://github.com/ray-project/ray/pull/22171 and ray-project/ray@ddd1160, the codes starting at L159 in ppo.py file of rsl_rl (version 2.0.2) need to be modified as follows:

AlexanderAbernathy commented 1 month ago

When facing the error std>=0, check the output 'Value Function Loss' to see whether it's inf or not. If it is inf, there is a solution that you can try. Based on the knowledge from issues ray-project/ray#19291 with the fix ray-project/ray#22171 and ray-project/ray@ddd1160, the codes starting at L159 in ppo.py file of rsl_rl (version 2.0.2) need to be modified as follows:

Note this kind of method may not work and it may reduce the learning speed. I've tested it using parameters 'iteration : 30000' & 'num_envs : 12000 to 30000' for training my own robot. The training process randomly failed between 1,000 and 18,000 iterations. I've checked the 'value batch' and 'return batch'. Once the training failed, these two values showed very large positive or negative numbers. I ultimately completed the entire training process by modifying the reward and penalty. Since I'm still fresh to the RL, I don't know exactly what happened. By the way, I've tested modifying the hyperparameters of PPO and the structure of networks. It didn't work. I would greatly appreciate it if someone could provide some information on this topic.

There is an unsuitable method to ensure the training proceeds. When std >= 0 and the Value Function Loss shows inf, you can first adjust some parameters in the project and then use --resume to load the checkpoint and continue training.

weifeng-lt commented 2 weeks ago

Adding code actions = torch.clip(actions, min=-6.28, max=6.28) before env.step(actions) seems to help. And it is better to add a penalty to actions to prevent the actor model from outputting too large values.

leggedrobotics / rsl_rl

[Bug Report] actor's std becomes "nan" during PPO training #33