[Proposal] Observation Histories

KyleM73 commented 1 week ago

Proposal

Many methods use observation histories (e.g. RMA/Student-Teacher Methods, Async ActorCritic, etc.) but histories are not currently supported by default. It is easy to make a wrapper class observation that stores histories on a per-observation term level, but there doesn't appear to be a way to store histories at the observation group level. Sometimes, e.g. for images it may be desirable to stack the history in terms of the individual observation term, but other times it may be more desirable to stack the history at the group level (which would require a change to the manager). This was something we had an implementation for at BDAI. Is this something that is currently already being worked on? If not I would be happy to take it on, but wasn't going to start working on it for a PR if someone else is already doing so.

Alternatives

Using the Observation Term class functionality to implement histories on a per-term basis, eg this minimal implementation. This stacks obs terms A B C as AAABBBCCC for history=3.

class HistoryObsWrapper(ManagerTermBase):
    def __init__(self, env: ManagerBasedRLEnvCfg, cfg: ObservationTermCfg) -> None:
        super().__init__(cfg, env)
        self.func = self.cfg.params["func"]
        if isinstance(self.func, ManagerTermBase):
            self.func = self.func(env, cfg)
        self.obs_len = self.func(env, **self.cfg.params["func_params"]).size(-1)   # TODO generalize this to multiD obs
        self.history_len = self.cfg.params["history_len"]
        self.data = torch.zeros(self.num_envs, self.history_len, self.obs_len, device=self.device)
        self.scale = self.cfg.params["scale"]

    def reset(self, env_ids: torch.Tensor = None) -> None:
        if isinstance(self.func, ManagerTermBase):
            self.func.reset(env_ids)
        self.data[env_ids] = self.func(self._env, **self.cfg.params["func_params"])[env_ids].view(-1, 1, self.obs_len).expand(-1, self.history_len, -1)

    def __call__(self, env, func, history_len, func_params, scale) -> torch.Tensor:
        self.data = self.data.roll(1, dims=1)
        self.data[:, 0] = self.func(self._env, **self.cfg.params["func_params"])
        return self.scale * self.data.clone().flatten(start_dim=1)

Using the Observation Group maybe with a post_init flag to stack whole observations as a history. This stacks obs terms A B C as ABCABCABC for history=3.
Histories could also be handled at the RL framework/policy level. It is my belief this would be better handled by Isaac Lab directly, but it is an option to just require frameworks to deal with it themselves, e.g. the storage buffer in rsl_rl.

Checklist

[x] I have checked that there is no similar issue in the repo

Acceptance Criteria

Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.

[ ] Option to stack histories at a per-term level
[ ] Option to stack histories at a per-group level

mpgussert commented 1 week ago

Hello @KyleM73

To my knowledge our sensor base class is designed to support recording history https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/sensor_base.py

checkout the contact sensor, for example https://github.com/isaac-sim/IsaacLab/blob/cb9fee62f14d03f07837cca29977088b6cb43a94/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/contact_sensor/contact_sensor.py#L291

Does this meet your needs? If not, can it be built on the current sensor history API?

KyleM73 commented 1 week ago

For observations that are just reading sensors that would satisfy the per-term history, but not the per-group history. In addition, not all observations come from reading sensor data, for example last action (currently only supports a single step) or generated commands. Maybe this is a larger question of "where should histories be stored", but unless the design decision is to implement a corresponding sensor for each observation type, it seems to me like there should be a way to track observation histories more generally i.e. separate from the sensor history API. If the desired solution is to always do it at a per-term level then I can PR a cleaned-up version of the code I provided above to the core isaac.lab MDP. But I do think there is value in supporting histories at a per-group level. I would love to hear feedback either way, for per-term and/or per-group histories.

KyleM73 commented 1 week ago

One additional consideration for dealing with this at the manager level as opposed to the observation term level is when adding noise for DR. Using the code I provided above, independent and DIFFERENT noise will be added to the history at each time step by the manager, but this is not realistic, as the noise added at time step k should be the same noise observed for the previous time step when the current step is k+1. This doesn't make a huge difference in practice as the policy learns to be robust to the noise anyways, but it is wrong.

Mayankm96 commented 1 week ago

Hi @KyleM73,

I agree. This is a feature we should have. There are many different ways you can do this (depending on the end goal):

Making an environment wrapper that does stacking of observations, similar to FrameStack in Gymnasium API
Adding history storage as a modifier functionality. This records the history per term (which may not be the most efficient but will be the most generic). I noticed right now that the modifiers act "before" any clipping/noisifying. However, we can change this ordering since doing it afterward makes more sense.

@jtigue-bdai will be able to provide better guidance here

KyleM73 commented 1 week ago

Thrilled to work with @jtigue-bdai again :)

Some additional thoughts: The best case scenario allows for both obs terms and obs groups to be configured. For example, maybe a history of joint velocities is desired (eg to estimate acceleration if it isn't available on a given robot), but only the most recent command. In that case, the implementation would need to function at the per-term level. Alternatively, for methods like RMA or concurrent estimation, the policy and estimator, respectively, receive obs group histories, and doing so would be more efficient than handling it at the per-term level. I personally think both per-term and per-group should be supported. We could in theory handle this as an env wrapper with a config specifying how many time steps to record for each obs term and group separately. This would be the most efficient but also feels somewhat hacky and not in line with the design of the rest of the manager-based framework, although I'm happy to implement it however. Looking forward to thoughts!

isaac-sim / IsaacLab