isaac-sim / IsaacLab

Unified framework for robot learning built on NVIDIA Isaac Sim
https://isaac-sim.github.io/IsaacLab
Other
1.89k stars 726 forks source link

[Bug Report] Implementation inaccuracy in RSL-RL wrapper #888

Closed dxyy1 closed 1 week ago

dxyy1 commented 2 weeks ago

Describe the bug

When using the ActorCriticRecurrent class for training, the function that resets the hidden states of the recurrent networks does not function incorrectly in Isaaclab. Specifically, the reset() method of the Memory class in actor_critic_recurrent.py currently resets the hidden states to 0.0 even when no environment is done, due to a potential implementation inaccuracy in the isaaclab's RSL-RL wrapper.

Steps to reproduce

To see the undesired resetting happening during runtime, we will start a debugging session in vs code. For a minimal example, we can edit the existing lab_tasks files. Specifically, we will add one line to the following file

  1. edit rsl_rl_ppo_cfg path to file: isaaclab/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/locomotion/velocity/config/go1/agents/rsl_rl_ppo_cfg.py
# ... code before 

class UnitreeGo1RoughPPORunnerCfg(RslRlOnPolicyRunnerCfg):
    num_steps_per_env = 24
    max_iterations = 1500
    save_interval = 50
    experiment_name = "unitree_go1_rough"
    empirical_normalization = False
    policy = RslRlPpoActorCriticCfg(
        class_name="ActorCriticRecurrent", # <------------------- added line here
        init_noise_std=1.0,
        actor_hidden_dims=[512, 256, 128],
        critic_hidden_dims=[512, 256, 128],
        activation="elu",
    )

# ...  code afterwards
  1. add a breakpoint in the reset() method of Memory class. path to file: saaclab/_isaac_sim/kit/python/lib/python3.10/site-packages/rsl_rl/modules/actor_critic_recurrent.py

    def reset(self, dones=None):
        # When the RNN is an LSTM, self.hidden_states_a is a list with hidden_state and cell_state
        for hidden_state in self.hidden_states:
            hidden_state[..., dones, :] = 0.0  # <----------------- add breakpoint here

    image

  2. add the following configuration in launch.json for the python debugger.

{
    "name": "Debug hidden state debug",
    "type": "debugpy",
    "request": "launch",
    "program": "${workspaceFolder}/source/standalone/workflows/rsl_rl/train.py",
    "console": "integratedTerminal",
    "args": [
        "--headless",
        "--task", "Isaac-Velocity-Flat-Unitree-Go1-v0", 
        "--num_envs", "2", // fewer env for faster compiling
    ],
},
  1. run the corresponding task of the rsl_rl_ppo_cfg from vs code debugger by selecting Debug hidden state debug then pressing F5 image

Discussion

When the program stops at the breakpoint, we should be able to see something similar to the following in the vs code debug console image The hidden_state has shape [num_layers, num_envs, hidden_size]. and done has shape [num_env] where each value is binary 0=env not done and 1=env done.

My assumption here is that the intended behaviour is for the hidden states to reset only when the corresponding env is done. The current behaviour below shows that it always resets the 0th env hidden states. image

I think this issue is due to incorrect data type. As dones is of type Long instead of the intended bool, making python indexing the tensor instead of applying the boolean mask.

However, I can trace the type inaccuracy back to isaaclab's rsl-rl wrapper /workspace/isaaclab/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/rsl_rl/vecenv_wrapper.py as show below, where if we comment out the type conversion, the hidden states are reset correctly. image However, this would result in nan when computing value function loss and surrogate loss. So I was wondering if there is proper way to fix this, or if this is an issue to begin with at all?

Additional Info

For now, a simple and effective fix I can think of would be setting dones.bool() as shown below image But I am not too sure if this is desirable as it is directly editing the rsl-rl package.

System Info

Checklist

Acceptance Criteria

Mayankm96 commented 2 weeks ago

Thanks a lot for reporting a thorough issue.

Although I agree torch.long isn't great, it seems that "the default" does expect torch.long.

The problem seems to lie more with rsl-rl and it should be fixed there by doing dones.bool() like you did above.

Mayankm96 commented 1 week ago

@dxyy1 Could you please send a fix there and close the issue here? It doesn't seem related to Isaac Lab. Thanks!