[Bug Report] Stale values in scene robot data

amacati commented 9 months ago

Describe the bug

The data in env.scene["robot"].data is sometimes (or potentially always) stale. This can be seen when using observations that extract cartesian positions of robot frames.

Steps to reproduce

You can see the impact in the following example. It creates the FrankaCubeLiftEnv, but instead of the usual observation space it uses only the end effector world position as observation.

When resetting the environment, the end effector world position is exactly zero. This cannot be, as the robot's default joint configuration does not place the end effector at [0, 0, 0] in world coordinates.

from omni.isaac.orbit.app import AppLauncher

app_launcher = AppLauncher({"headless": False})

import torch
import gymnasium

from omni.isaac.orbit.managers import ObservationGroupCfg, ObservationTermCfg
from omni.isaac.orbit_tasks.manipulation.lift.config.franka.ik_rel_env_cfg import FrankaCubeLiftEnvCfg
from omni.isaac.orbit_tasks.manipulation.lift.lift_env_cfg import ObjectTableSceneCfg
from omni.isaac.orbit_tasks.manipulation.lift.lift_env_cfg import ObservationsCfg
from omni.isaac.orbit.utils import configclass

def ee_pos_w(env) -> torch.Tensor:
    return env.scene["robot"].data.body_pos_w[:, 8, :3]

@configclass
class ObservationsCfg():

    @configclass
    class ObsCfg(ObservationGroupCfg):
        ee_pos_w = ObservationTermCfg(func=ee_pos_w)

        def __post_init__(self):
            self.concatenate_terms = True

    obs = ObsCfg()

def main():
    scene = ObjectTableSceneCfg(num_envs=1, env_spacing=2.5, replicate_physics=False)
    observations = ObservationsCfg()
    env_cfg = FrankaCubeLiftEnvCfg(scene=scene, observations=observations)
    env = gymnasium.make("Isaac-Lift-Cube-Franka-v0", cfg=env_cfg)

    obs, _ = env.reset()
    assert not torch.all(obs["obs"] == 0), "Robot end effector position is all zeros"

if __name__ == "__main__":
    main()

[INFO]: Completed setting up the environment...
Traceback (most recent call last):
  File "/home/amacati/repos/lsy_rl/debug.py", line 43, in <module>
    main()
  File "/home/amacati/repos/lsy_rl/debug.py", line 39, in main
    assert not torch.all(obs["obs"] == 0), "Robot end effector position is all zeros"
AssertionError: Robot end effector position is all zeros

Furthermore, when the simulation runs for two episodes, the first observation of the second episode from calling env.reset() has the end effector at the exact same coordinates as the last observation of the first episode. This cannot be, as the robot has made random movements throughout the episode. The buffer seems to have never been cleared. This leads me to suspect that either the buffer is not properly written to during env.reset(), or that it always lags behind by one environment step. Either of those two options leads to false observations.

from omni.isaac.orbit.app import AppLauncher

app_launcher = AppLauncher({"headless": False})

import torch
import gymnasium

from omni.isaac.orbit.managers import ObservationGroupCfg, ObservationTermCfg
from omni.isaac.orbit_tasks.manipulation.lift.config.franka.ik_rel_env_cfg import FrankaCubeLiftEnvCfg
from omni.isaac.orbit_tasks.manipulation.lift.lift_env_cfg import ObjectTableSceneCfg
from omni.isaac.orbit_tasks.manipulation.lift.lift_env_cfg import ObservationsCfg
from omni.isaac.orbit.utils import configclass

def ee_pos_w(env) -> torch.Tensor:
    return env.scene["robot"].data.body_pos_w[:, 8, :3]

@configclass
class ObservationsCfg():

    @configclass
    class ObsCfg(ObservationGroupCfg):
        ee_pos_w = ObservationTermCfg(func=ee_pos_w)

        def __post_init__(self):
            self.concatenate_terms = True

    obs = ObsCfg()

def main():
    scene = ObjectTableSceneCfg(num_envs=1, env_spacing=2.5, replicate_physics=False)
    observations = ObservationsCfg()
    env_cfg = FrankaCubeLiftEnvCfg(scene=scene, observations=observations)
    env = gymnasium.make("Isaac-Lift-Cube-Franka-v0", cfg=env_cfg)

    final_obs = None
    for _ in range(2):
        done = False
        obs, _ = env.reset()
        if final_obs is not None:
            if torch.all(obs["obs"] == final_obs["obs"]):
                raise RuntimeError("Stale observation detected. Wrist coordinates are identical")
        while not done:
            action = torch.as_tensor(env.action_space.sample()).cuda()
            next_obs, _, terminated, truncated, _ = env.step(action)
            done = torch.any(terminated | truncated)
            if done:
                final_obs = next_obs

if __name__ == "__main__":
    main()

[INFO]: Completed setting up the environment...
Traceback (most recent call last):
  File "/home/amacati/repos/lsy_rl/debug.py", line 54, in <module>
    main()
  File "/home/amacati/repos/lsy_rl/debug.py", line 44, in main
    raise RuntimeError("Stale observation detected. Wrist coordinates are identical")
RuntimeError: Stale observation detected. Wrist coordinates are identical

Interestingly enough, this problem does not manifest itself in the object data as can be checked by looking at the object world position instead of the end effector position. It does however affect more members of env.scene["robot"].data, e.g. root_state_w.

from omni.isaac.orbit.app import AppLauncher

app_launcher = AppLauncher({"headless": False})

import torch
import gymnasium

from omni.isaac.orbit.managers import ObservationGroupCfg, ObservationTermCfg
from omni.isaac.orbit_tasks.manipulation.lift.config.franka.ik_rel_env_cfg import FrankaCubeLiftEnvCfg
from omni.isaac.orbit_tasks.manipulation.lift.lift_env_cfg import ObjectTableSceneCfg
from omni.isaac.orbit_tasks.manipulation.lift.lift_env_cfg import ObservationsCfg
from omni.isaac.orbit.utils import configclass

def object_pos_w(env) -> torch.Tensor:
    return env.scene["object"].data.root_pos_w[:, :3]

@configclass
class ObservationsCfg():

    @configclass
    class ObsCfg(ObservationGroupCfg):
        object_pos_w = ObservationTermCfg(func=object_pos_w)

        def __post_init__(self):
            self.concatenate_terms = True

    obs = ObsCfg()

def main():
    scene = ObjectTableSceneCfg(num_envs=1, env_spacing=2.5, replicate_physics=False)
    observations = ObservationsCfg()
    env_cfg = FrankaCubeLiftEnvCfg(scene=scene, observations=observations)
    env = gymnasium.make("Isaac-Lift-Cube-Franka-v0", cfg=env_cfg)

    final_obs = None
    for _ in range(2):
        done = False
        obs, _ = env.reset()
        if final_obs is not None:
            if torch.all(obs["obs"] == final_obs["obs"]):
                raise RuntimeError("Stale observation detected. Object coordinates are identical")
        while not done:
            action = torch.as_tensor(env.action_space.sample()).cuda()
            next_obs, _, terminated, truncated, _ = env.step(action)
            done = torch.any(terminated | truncated)
            if done:
                final_obs = next_obs

if __name__ == "__main__":
    main()

(Finishes without any issue)

System Info

Describe the characteristic of your environment:

Commit: 62aeea198140ff21a570d28ea3fb4226b39397ae
Isaac Sim Version: 2023.1.1
OS: Ubuntu 20.04
GPU: RTX 4090
CUDA: 12.0
GPU Driver: 525.147.05

Checklist

[x] I have checked that there is no similar issue in the repo (required)
[x] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Acceptance Criteria

Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.

[ ] The data in env.scene["robot"] is updated correctly on reset

Mayankm96 commented 9 months ago

Hi @amacati ,

Many physics engines do a simulation step as a two-level call: forward() and simulate(), where the kinematic and dynamic states are updated, respectively. Unfortunately, PhysX has only a single step() call where these two operations are combined. Due to computations through GPU kernels, it is not so straightforward for them to split these operations. Thus, at the moment, it is not possible to set the joint states and do a forward call to update the kinematic states of links.

As you have noticed, the issue mainly appears in kinematic chains, where you'd expect the link states to be updated correctly when you update the root or joint states. This affects both initialization as well as episodic resets. For root and joint states, the setters directly write into the data field as well as the PhysX buffers. Thus, those values appear "updated."

While this is erroneous, there is currently no direct workaround for it. From our experience in using IsaacGym, the reset values are crucial. However, it does not affect the agent learning critically enough.

If you always need the link states to be correct, a possible solution is to make a warp simulation model that is detached from PhysX, and you can call it to update the link states whenever you need. It's not ideal, but that's a possible workaround for now. We have made a feature request to the PhysX team to have separated forward and simulate calls like other physics engines. However, at this point, there is no set timeline for this feature request.

amacati commented 9 months ago

Hi @Mayankm96 ,

Thanks a lot for the swift reply! Your explanation makes a lot of sense, but it's a bummer nevertheless. I think it would be great to place a warning in the docs somewhere so users know about this limitation.

Also, if I understood you correctly, the reset() function just "queues" a reset by overwriting the joint states, and then gets applied during the next step() call. That means the buffers from t=1,... are, in fact, back in sync again, right? I was worried they might always lag behind by one time step, which would be pretty bad.

I guess I will go with a wrapper for now that fuses each reset() with a single step(). Thanks again for your detailed answer, feel free to close the issue at any time. Just letting it open for know so people with the same question can find it more easily.

Mayankm96 commented 9 months ago

Good point. I have made an MR #241 based on the reply above.

I would be careful about putting a step() following a reset() call. This only works well when you have a single environment. Since right now you may have multiple environment instances running simultaneously (i.e. sharing the same stage), calling step() will cause all of them to do a forward stepping with cached commands from the last time. This may not be what you want. Depending on how often "reset" happens, this may adversely affect learning.

amacati commented 9 months ago

True. I'm working with a constant time horizon task that always truncates though, so I think it should be fine.

Mayankm96 commented 9 months ago

I see. Then it is fine.

Closing this issue now as it isn't fixable on Orbit side directly.

isaac-sim / IsaacLab