Closed dxyy1 closed 1 week ago
Thanks a lot for reporting a thorough issue.
Although I agree torch.long
isn't great, it seems that "the default" does expect torch.long
.
The problem seems to lie more with rsl-rl and it should be fixed there by doing dones.bool()
like you did above.
@dxyy1 Could you please send a fix there and close the issue here? It doesn't seem related to Isaac Lab. Thanks!
Describe the bug
When using the
ActorCriticRecurrent
class for training, the function that resets the hidden states of the recurrent networks does not function incorrectly in Isaaclab. Specifically, thereset()
method of theMemory
class inactor_critic_recurrent.py
currently resets the hidden states to0.0
even when no environment is done, due to a potential implementation inaccuracy in the isaaclab's RSL-RL wrapper.Steps to reproduce
To see the undesired resetting happening during runtime, we will start a debugging session in vs code. For a minimal example, we can edit the existing
lab_tasks
files. Specifically, we will add one line to the following filersl_rl_ppo_cfg
path to file: isaaclab/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/locomotion/velocity/config/go1/agents/rsl_rl_ppo_cfg.pyadd a breakpoint in the
reset()
method ofMemory
class. path to file: saaclab/_isaac_sim/kit/python/lib/python3.10/site-packages/rsl_rl/modules/actor_critic_recurrent.pyadd the following configuration in
launch.json
for the python debugger.rsl_rl_ppo_cfg
from vs code debugger by selecting Debug hidden state debug then pressingF5
Discussion
When the program stops at the breakpoint, we should be able to see something similar to the following in the vs code debug console The
hidden_state
has shape [num_layers, num_envs, hidden_size]. anddone
has shape [num_env] where each value is binary 0=env not done and 1=env done.My assumption here is that the intended behaviour is for the hidden states to reset only when the corresponding env is done. The current behaviour below shows that it always resets the 0th env hidden states.
I think this issue is due to incorrect data type. As
dones
is of typeLong
instead of the intendedbool
, making python indexing the tensor instead of applying the boolean mask.However, I can trace the type inaccuracy back to isaaclab's rsl-rl wrapper /workspace/isaaclab/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/rsl_rl/vecenv_wrapper.py as show below, where if we comment out the type conversion, the hidden states are reset correctly. However, this would result in
nan
when computing value function loss and surrogate loss. So I was wondering if there is proper way to fix this, or if this is an issue to begin with at all?Additional Info
For now, a simple and effective fix I can think of would be setting
dones.bool()
as shown below But I am not too sure if this is desirable as it is directly editing the rsl-rl package.System Info
Checklist
Acceptance Criteria