isaac-sim / IsaacLab

Unified framework for robot learning built on NVIDIA Isaac Sim
https://isaac-sim.github.io/IsaacLab
Other
2.29k stars 935 forks source link

[Bug Report] rsl-rl crashes for robots with mimic joints #905

Closed KKSTB closed 1 month ago

KKSTB commented 2 months ago

Hi. Thank you for the great work.

Describe the bug

I am trying to train a virtual robot with multiple mimic joints. More specifically like this: https://github.com/KKSTB/isaac_lab_gundam_robot_urdf

I imported the robot URDF using Isaac Sim URDF importer and try to train it by referring to the Unitree G1 as example.

Total number of actuated joints are 39. But with mimic joints the total number of joints becomes 77. So there is a warning:

[Warning] [omni.isaac.lab.assets.articulation.articulation] Not all actuators are configured! Total number of actuated joints not equal to number of joints available: 39 != 77.

However rsl-rl randomly crashes with the following call stacks, after running a few tens of iterations:

Traceback (most recent call last):
  File "/home/kk/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 134, in <module>
    main()
  File "/home/kk/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 126, in main
    runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True)
  File "/home/kk/IsaacLab/_isaac_sim/kit/python/lib/python3.10/site-packages/rsl_rl/runners/on_policy_runner.py", line 111, in learn
    actions = self.alg.act(obs, critic_obs)
  File "/home/kk/IsaacLab/_isaac_sim/kit/python/lib/python3.10/site-packages/rsl_rl/algorithms/ppo.py", line 73, in act
    self.transition.actions = self.actor_critic.act(obs).detach()
  File "/home/kk/IsaacLab/_isaac_sim/kit/python/lib/python3.10/site-packages/rsl_rl/modules/actor_critic.py", line 105, in act
    return self.distribution.sample()
  File "/home/kk/.local/share/ov/pkg/isaac-sim-4.1.0/exts/omni.isaac.ml_archive/pip_prebundle/torch/distributions/normal.py", line 70, in sample
    return torch.normal(self.loc.expand(shape), self.scale.expand(shape))
RuntimeError: normal expects all elements of std >= 0.0

I tried to ignore these mimic joints by specifying joints excluding mimic joints to the ObservationsCfg, but no luck. I did so with this code in __post_init__() in my custom python similar to rough_env_cfg.py of Unitree G1 to override the defaults in velocity_env_cfg.py:

actuators = list[str]()
for actuator_cfg in self.scene.robot.actuators.values():
    actuators.extend(actuator_cfg.joint_names_expr)
self.actions.joint_pos.joint_names = actuators
self.observations.policy.joint_pos.params["asset_cfg"] = SceneEntityCfg(
    "robot",
    joint_names=actuators,
)
self.observations.policy.joint_vel.params["asset_cfg"] = SceneEntityCfg(
    "robot",
    joint_names=actuators,
)

The configuration:

[INFO] Command Manager:  <CommandManager> contains 1 active terms.
+------------------------------------------------+
|              Active Command Terms              |
+-------+---------------+------------------------+
| Index | Name          |          Type          |
+-------+---------------+------------------------+
|   0   | base_velocity | UniformVelocityCommand |
+-------+---------------+------------------------+

[INFO] Action Manager:  <ActionManager> contains 1 active terms.
+------------------------------------+
|  Active Action Terms (shape: 39)   |
+--------+-------------+-------------+
| Index  | Name        |   Dimension |
+--------+-------------+-------------+
|   0    | joint_pos   |          39 |
+--------+-------------+-------------+

Module omni.isaac.lab.utils.warp.kernels aa4ad5a load on device 'cuda:0' took 3.04 ms
[INFO] Observation Manager: <ObservationManager> contains 1 groups.
+----------------------------------------------------------+
| Active Observation Terms in Group: 'policy' (shape: (316,)) |
+-----------+--------------------------------+-------------+
|   Index   | Name                           |    Shape    |
+-----------+--------------------------------+-------------+
|     0     | base_lin_vel                   |     (3,)    |
|     1     | base_ang_vel                   |     (3,)    |
|     2     | projected_gravity              |     (3,)    |
|     3     | velocity_commands              |     (3,)    |
|     4     | joint_pos                      |    (39,)    |
|     5     | joint_vel                      |    (39,)    |
|     6     | actions                        |    (39,)    |
|     7     | height_scan                    |    (187,)   |
+-----------+--------------------------------+-------------+

[INFO] Event Manager:  <EventManager> contains 2 active terms.
+--------------------------------------+
| Active Event Terms in Mode: 'startup' |
+----------+---------------------------+
|  Index   | Name                      |
+----------+---------------------------+
|    0     | physics_material          |
+----------+---------------------------+
+---------------------------------------+
|  Active Event Terms in Mode: 'reset'  |
+--------+------------------------------+
| Index  | Name                         |
+--------+------------------------------+
|   0    | base_external_force_torque   |
|   1    | reset_base                   |
|   2    | reset_robot_joints           |
+--------+------------------------------+

[INFO] Termination Manager:  <TerminationManager> contains 2 active terms.
+---------------------------------+
|     Active Termination Terms    |
+-------+--------------+----------+
| Index | Name         | Time Out |
+-------+--------------+----------+
|   0   | time_out     |   True   |
|   1   | base_contact |  False   |
+-------+--------------+----------+

[INFO] Reward Manager:  <RewardManager> contains 18 active terms.
+---------------------------------------------+
|             Active Reward Terms             |
+-------+-------------------------+-----------+
| Index | Name                    |    Weight |
+-------+-------------------------+-----------+
|   0   | track_lin_vel_xy_exp    |       1.0 |
|   1   | track_ang_vel_z_exp     |       2.0 |
|   2   | lin_vel_z_l2            |       0.0 |
|   3   | ang_vel_xy_l2           |     -0.05 |
|   4   | dof_torques_l2          |  -1.5e-07 |
|   5   | dof_acc_l2              | -1.25e-07 |
|   6   | action_rate_l2          |    -0.005 |
|   7   | feet_air_time           |      0.25 |
|   8   | flat_orientation_l2     |      -1.0 |
|   9   | dof_pos_limits          |      -1.0 |
|   10  | termination_penalty     |    -200.0 |
|   11  | feet_slide              |      -0.1 |
|   12  | joint_deviation_hip     |      -0.1 |
|   13  | joint_deviation_arms    |      -0.1 |
|   14  | joint_deviation_fingers |     -0.05 |
|   15  | joint_deviation_torso   |      -0.1 |
|   16  | joint_deviation_head    |      -0.1 |
|   17  | joint_deviation_thrust  |      -0.1 |
+-------+-------------------------+-----------+

[INFO] Curriculum Manager:  <CurriculumManager> contains 1 active terms.
+---------------------------+
|  Active Curriculum Terms  |
+--------+------------------+
| Index  | Name             |
+--------+------------------+
|   0    | terrain_levels   |
+--------+------------------+

[INFO]: Completed setting up the environment...
Actor MLP: Sequential(
  (0): Linear(in_features=316, out_features=512, bias=True)
  (1): ELU(alpha=1.0)
  (2): Linear(in_features=512, out_features=256, bias=True)
  (3): ELU(alpha=1.0)
  (4): Linear(in_features=256, out_features=128, bias=True)
  (5): ELU(alpha=1.0)
  (6): Linear(in_features=128, out_features=39, bias=True)
)
Critic MLP: Sequential(
  (0): Linear(in_features=316, out_features=512, bias=True)
  (1): ELU(alpha=1.0)
  (2): Linear(in_features=512, out_features=256, bias=True)
  (3): ELU(alpha=1.0)
  (4): Linear(in_features=256, out_features=128, bias=True)
  (5): ELU(alpha=1.0)
  (6): Linear(in_features=128, out_features=1, bias=True)
)
Setting seed: 42

If I edit the URDF so that all mimic joints are set to fixed, it seems there is no crash.

System Info

kellyguo11 commented 1 month ago

Hello, the error you are facing generally suggests that there were NaNs propagated into the training pipeline, which could often be a result of instabilities in simulation. Tuning mimic joints can be tricky and we have observed cases where mimic joints can cause simulation stability issues. We are working on improving the simulation stability for mimic joints for the next release. In the meantime, it could often be helpful to check for NaNs in the observation terms before passing them to the policy and replacing NaNs with some reasonable value. Sometimes clamping the actions to make sure they are not in an extreme range can also help improve simulation stability.