[Bug Report] rsl-rl crashes for robots with mimic joints

Hi. Thank you for the great work.

Describe the bug

I am trying to train a virtual robot with multiple mimic joints. More specifically like this: https://github.com/KKSTB/isaac_lab_gundam_robot_urdf

I imported the robot URDF using Isaac Sim URDF importer and try to train it by referring to the Unitree G1 as example.

Total number of actuated joints are 39. But with mimic joints the total number of joints becomes 77. So there is a warning:

[Warning] [omni.isaac.lab.assets.articulation.articulation] Not all actuators are configured! Total number of actuated joints not equal to number of joints available: 39 != 77.

However rsl-rl randomly crashes with the following call stacks, after running a few tens of iterations:

Traceback (most recent call last):
  File "/home/kk/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 134, in <module>
    main()
  File "/home/kk/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 126, in main
    runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True)
  File "/home/kk/IsaacLab/_isaac_sim/kit/python/lib/python3.10/site-packages/rsl_rl/runners/on_policy_runner.py", line 111, in learn
    actions = self.alg.act(obs, critic_obs)
  File "/home/kk/IsaacLab/_isaac_sim/kit/python/lib/python3.10/site-packages/rsl_rl/algorithms/ppo.py", line 73, in act
    self.transition.actions = self.actor_critic.act(obs).detach()
  File "/home/kk/IsaacLab/_isaac_sim/kit/python/lib/python3.10/site-packages/rsl_rl/modules/actor_critic.py", line 105, in act
    return self.distribution.sample()
  File "/home/kk/.local/share/ov/pkg/isaac-sim-4.1.0/exts/omni.isaac.ml_archive/pip_prebundle/torch/distributions/normal.py", line 70, in sample
    return torch.normal(self.loc.expand(shape), self.scale.expand(shape))
RuntimeError: normal expects all elements of std >= 0.0

I tried to ignore these mimic joints by specifying joints excluding mimic joints to the ObservationsCfg, but no luck. I did so with this code in __post_init__() in my custom python similar to rough_env_cfg.py of Unitree G1 to override the defaults in velocity_env_cfg.py:

actuators = list[str]()
for actuator_cfg in self.scene.robot.actuators.values():
    actuators.extend(actuator_cfg.joint_names_expr)
self.actions.joint_pos.joint_names = actuators
self.observations.policy.joint_pos.params["asset_cfg"] = SceneEntityCfg(
    "robot",
    joint_names=actuators,
)
self.observations.policy.joint_vel.params["asset_cfg"] = SceneEntityCfg(
    "robot",
    joint_names=actuators,
)

The configuration:

[INFO] Command Manager:  <CommandManager> contains 1 active terms.
+------------------------------------------------+
|              Active Command Terms              |
+-------+---------------+------------------------+
| Index | Name          |          Type          |
+-------+---------------+------------------------+
|   0   | base_velocity | UniformVelocityCommand |
+-------+---------------+------------------------+

[INFO] Action Manager:  <ActionManager> contains 1 active terms.
+------------------------------------+
|  Active Action Terms (shape: 39)   |
+--------+-------------+-------------+
| Index  | Name        |   Dimension |
+--------+-------------+-------------+
|   0    | joint_pos   |          39 |
+--------+-------------+-------------+

Module omni.isaac.lab.utils.warp.kernels aa4ad5a load on device 'cuda:0' took 3.04 ms
[INFO] Observation Manager: <ObservationManager> contains 1 groups.
+----------------------------------------------------------+
| Active Observation Terms in Group: 'policy' (shape: (316,)) |
+-----------+--------------------------------+-------------+
|   Index   | Name                           |    Shape    |
+-----------+--------------------------------+-------------+
|     0     | base_lin_vel                   |     (3,)    |
|     1     | base_ang_vel                   |     (3,)    |
|     2     | projected_gravity              |     (3,)    |
|     3     | velocity_commands              |     (3,)    |
|     4     | joint_pos                      |    (39,)    |
|     5     | joint_vel                      |    (39,)    |
|     6     | actions                        |    (39,)    |
|     7     | height_scan                    |    (187,)   |
+-----------+--------------------------------+-------------+

[INFO] Event Manager:  <EventManager> contains 2 active terms.
+--------------------------------------+
| Active Event Terms in Mode: 'startup' |
+----------+---------------------------+
|  Index   | Name                      |
+----------+---------------------------+
|    0     | physics_material          |
+----------+---------------------------+
+---------------------------------------+
|  Active Event Terms in Mode: 'reset'  |
+--------+------------------------------+
| Index  | Name                         |
+--------+------------------------------+
|   0    | base_external_force_torque   |
|   1    | reset_base                   |
|   2    | reset_robot_joints           |
+--------+------------------------------+

[INFO] Termination Manager:  <TerminationManager> contains 2 active terms.
+---------------------------------+
|     Active Termination Terms    |
+-------+--------------+----------+
| Index | Name         | Time Out |
+-------+--------------+----------+
|   0   | time_out     |   True   |
|   1   | base_contact |  False   |
+-------+--------------+----------+

[INFO] Reward Manager:  <RewardManager> contains 18 active terms.
+---------------------------------------------+
|             Active Reward Terms             |
+-------+-------------------------+-----------+
| Index | Name                    |    Weight |
+-------+-------------------------+-----------+
|   0   | track_lin_vel_xy_exp    |       1.0 |
|   1   | track_ang_vel_z_exp     |       2.0 |
|   2   | lin_vel_z_l2            |       0.0 |
|   3   | ang_vel_xy_l2           |     -0.05 |
|   4   | dof_torques_l2          |  -1.5e-07 |
|   5   | dof_acc_l2              | -1.25e-07 |
|   6   | action_rate_l2          |    -0.005 |
|   7   | feet_air_time           |      0.25 |
|   8   | flat_orientation_l2     |      -1.0 |
|   9   | dof_pos_limits          |      -1.0 |
|   10  | termination_penalty     |    -200.0 |
|   11  | feet_slide              |      -0.1 |
|   12  | joint_deviation_hip     |      -0.1 |
|   13  | joint_deviation_arms    |      -0.1 |
|   14  | joint_deviation_fingers |     -0.05 |
|   15  | joint_deviation_torso   |      -0.1 |
|   16  | joint_deviation_head    |      -0.1 |
|   17  | joint_deviation_thrust  |      -0.1 |
+-------+-------------------------+-----------+

[INFO] Curriculum Manager:  <CurriculumManager> contains 1 active terms.
+---------------------------+
|  Active Curriculum Terms  |
+--------+------------------+
| Index  | Name             |
+--------+------------------+
|   0    | terrain_levels   |
+--------+------------------+

[INFO]: Completed setting up the environment...
Actor MLP: Sequential(
  (0): Linear(in_features=316, out_features=512, bias=True)
  (1): ELU(alpha=1.0)
  (2): Linear(in_features=512, out_features=256, bias=True)
  (3): ELU(alpha=1.0)
  (4): Linear(in_features=256, out_features=128, bias=True)
  (5): ELU(alpha=1.0)
  (6): Linear(in_features=128, out_features=39, bias=True)
)
Critic MLP: Sequential(
  (0): Linear(in_features=316, out_features=512, bias=True)
  (1): ELU(alpha=1.0)
  (2): Linear(in_features=512, out_features=256, bias=True)
  (3): ELU(alpha=1.0)
  (4): Linear(in_features=256, out_features=128, bias=True)
  (5): ELU(alpha=1.0)
  (6): Linear(in_features=128, out_features=1, bias=True)
)
Setting seed: 42

If I edit the URDF so that all mimic joints are set to fixed, it seems there is no crash.

System Info

Commit: 6451d23
Isaac Sim Version: 4.1
OS: Ubuntu 22.04
GPU: RTX 4070 Ti
CUDA: 11.8
GPU Driver: 535 open

isaac-sim / IsaacLab

[Bug Report] rsl-rl crashes for robots with mimic joints #905

Describe the bug

System Info