[Bug Report] Implementation inaccuracy in RSL-RL wrapper

dxyy1 commented 2 weeks ago

Describe the bug

When using the ActorCriticRecurrent class for training, the function that resets the hidden states of the recurrent networks does not function incorrectly in Isaaclab. Specifically, the reset() method of the Memory class in actor_critic_recurrent.py currently resets the hidden states to 0.0 even when no environment is done, due to a potential implementation inaccuracy in the isaaclab's RSL-RL wrapper.

Steps to reproduce

To see the undesired resetting happening during runtime, we will start a debugging session in vs code. For a minimal example, we can edit the existing lab_tasks files. Specifically, we will add one line to the following file

edit rsl_rl_ppo_cfg path to file: isaaclab/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/locomotion/velocity/config/go1/agents/rsl_rl_ppo_cfg.py

# ... code before 

class UnitreeGo1RoughPPORunnerCfg(RslRlOnPolicyRunnerCfg):
    num_steps_per_env = 24
    max_iterations = 1500
    save_interval = 50
    experiment_name = "unitree_go1_rough"
    empirical_normalization = False
    policy = RslRlPpoActorCriticCfg(
        class_name="ActorCriticRecurrent", # <------------------- added line here
        init_noise_std=1.0,
        actor_hidden_dims=[512, 256, 128],
        critic_hidden_dims=[512, 256, 128],
        activation="elu",
    )

# ...  code afterwards

add a breakpoint in the reset() method of Memory class. path to file: saaclab/_isaac_sim/kit/python/lib/python3.10/site-packages/rsl_rl/modules/actor_critic_recurrent.py

def reset(self, dones=None):
    # When the RNN is an LSTM, self.hidden_states_a is a list with hidden_state and cell_state
    for hidden_state in self.hidden_states:
        hidden_state[..., dones, :] = 0.0  # <----------------- add breakpoint here

add the following configuration in launch.json for the python debugger.

{
    "name": "Debug hidden state debug",
    "type": "debugpy",
    "request": "launch",
    "program": "${workspaceFolder}/source/standalone/workflows/rsl_rl/train.py",
    "console": "integratedTerminal",
    "args": [
        "--headless",
        "--task", "Isaac-Velocity-Flat-Unitree-Go1-v0", 
        "--num_envs", "2", // fewer env for faster compiling
    ],
},

run the corresponding task of the rsl_rl_ppo_cfg from vs code debugger by selecting Debug hidden state debug then pressing F5

Discussion

When the program stops at the breakpoint, we should be able to see something similar to the following in the vs code debug console The hidden_state has shape [num_layers, num_envs, hidden_size]. and done has shape [num_env] where each value is binary 0=env not done and 1=env done.

My assumption here is that the intended behaviour is for the hidden states to reset only when the corresponding env is done. The current behaviour below shows that it always resets the 0th env hidden states.

I think this issue is due to incorrect data type. As dones is of type Long instead of the intended bool, making python indexing the tensor instead of applying the boolean mask.

However, I can trace the type inaccuracy back to isaaclab's rsl-rl wrapper /workspace/isaaclab/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/rsl_rl/vecenv_wrapper.py as show below, where if we comment out the type conversion, the hidden states are reset correctly. However, this would result in nan when computing value function loss and surrogate loss. So I was wondering if there is proper way to fix this, or if this is an issue to begin with at all?

Additional Info

For now, a simple and effective fix I can think of would be setting dones.bool() as shown below But I am not too sure if this is desirable as it is directly editing the rsl-rl package.

System Info

Isaac Sim Version: 4.1.0-rc.7+4.1.14801.71533b68.gl
OS: Container is Ubuntu 22.04. Host is Ubuntu 20.04
GPU: NVIDIA GeForce RTX 4090
CUDA:12.2
GPU Driver: 535.183.01

Checklist

[x] I have checked that there is no similar issue in the repo (required)
[x] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Acceptance Criteria

[ ] Confirm this is an implementation inaccuracy.
[ ] Confirm whether this is inaccuracy is worth the trouble to fix.

Mayankm96 commented 2 weeks ago

Thanks a lot for reporting a thorough issue.

Although I agree torch.long isn't great, it seems that "the default" does expect torch.long.

The problem seems to lie more with rsl-rl and it should be fixed there by doing dones.bool() like you did above.

Mayankm96 commented 1 week ago

@dxyy1 Could you please send a fix there and close the issue here? It doesn't seem related to Isaac Lab. Thanks!

isaac-sim / IsaacLab