[Question] HELP!!! Parallel environments do not support independent multi-run execution

ItsBean commented 1 week ago

Question

I am working with parallel environments in Isaac Sim and encountered an issue where the state of one environment affects the others during execution. Specifically, when one environment reaches a done or truncated state, I attempt to reset only that environment, but this results in unintended behavior where the remaining environments are impacted.

Here is the simplified code structure I'm using:

num_envs=10
for _ in range(200):
    step += 1
    action = env.action_space.sample()  # Assuming this returns a (10, action_dim) tensor

    actions = torch.tensor(action, dtype=torch.float32).to(device)  # Shape: (10, 4)

    # Step the environment with all 10 actions
    obs, rewards, dones, truncateds, infos = env.step(actions)
    obs = obs['policy']  # Shape: (num_envs, obs_dim)

    # Reset environments that are done or truncated
    for i in range(num_envs):
        if dones[i] or truncateds[i]:
            new_obs, _ = env.reset(env_id=i)
            obs[i] = new_obs['policy']

env.reset(env_id=i) is not supported. So Isaac Sim currently doesn't support the functionality where environments can run independently without affecting each other. When one environment finishes (reaches done or truncated), it is not possible to reset only that specific environment without resetting or influencing the others.

Expectation

I would like a mechanism where each environment can be managed and reset independently, such that the progression of one environment does not impact the others.

pascal-roth commented 1 week ago

Hi, Each environment is reset automatically depending on the event functions you passed into the configuration. Looking into the step function, it is visible that all the environments are already getting reset when they terminate or run out of time. So your reset call is not necessary. The returned observations are already from the resetted environments.

We are using and testing this behavior and cannot see any effects on other environments. Can you please detail and provide a minimal script where you observe this behavior?

ItsBean commented 1 week ago

From the example code source/standalone/workflows/skrl/train.py, we can observe that in parallel environments, there is no mechanism where each environment is treated separately. The example code uses if terminated.any() or truncated.any(): as the reset condition, meaning that when one of the environments is done, all environments are reset. This can potentially impact the agent's exploration.

Additionally, I am not allowed to start multiple instances of Isaac Sim to run 10 environments. This is the puzzle that confuses me.

Here is the code defined in from skrl.trainers.torch import SequentialTrainer:

        # reset env
        states, infos = self.env.reset()

        for timestep in tqdm.tqdm(range(self.initial_timestep, self.timesteps), disable=self.disable_progressbar, file=sys.stdout):

            # pre-interaction
            for agent in self.agents:
                agent.pre_interaction(timestep=timestep, timesteps=self.timesteps)

            # compute actions
            with torch.no_grad():
                actions = torch.vstack([agent.act(states[scope[0]:scope[1]], timestep=timestep, timesteps=self.timesteps)[0] \
                                        for agent, scope in zip(self.agents, self.agents_scope)])

                # step the environments
                next_states, rewards, terminated, truncated, infos = self.env.step(actions)

                # render scene
                if not self.headless:
                    self.env.render()

                # record the environments' transitions
                for agent, scope in zip(self.agents, self.agents_scope):
                    agent.record_transition(states=states[scope[0]:scope[1]],
                                            actions=actions[scope[0]:scope[1]],
                                            rewards=rewards[scope[0]:scope[1]],
                                            next_states=next_states[scope[0]:scope[1]],
                                            terminated=terminated[scope[0]:scope[1]],
                                            truncated=truncated[scope[0]:scope[1]],
                                            infos=infos,
                                            timestep=timestep,
                                            timesteps=self.timesteps)

                # log environment info
                if self.environment_info in infos:
                    for k, v in infos[self.environment_info].items():
                        if isinstance(v, torch.Tensor) and v.numel() == 1:
                            for agent in self.agents:
                                agent.track_data(f"Info / {k}", v.item())

            # post-interaction
            for agent in self.agents:
                agent.post_interaction(timestep=timestep, timesteps=self.timesteps)

            # reset environments
            with torch.no_grad():
                if terminated.any() or truncated.any():
                    states, infos = self.env.reset()
                else:
                    states = next_states

ItsBean commented 1 week ago

If revising the environment configuration is the only solution to achieve independent environment resets, are there any tutorials or template code available to help modify the default environment configuration?

pascal-roth commented 1 week ago

the SequentialTrainer is outside of the scope of IsaacLab, please raise this issue there.

In our environment definition, each environment is reset individually as given here whenever it is terminated or truncated.

To run multiple environments, just increase the number of envs here

ItsBean commented 1 week ago

 # -- reset envs that terminated/timed-out and log the episode information
        reset_env_ids = self.reset_buf.nonzero(as_tuple=False).squeeze(-1)
        if len(reset_env_ids) > 0:
            self._reset_idx(reset_env_ids)

Based on this code, we can infer that the environment will reset automatically when the agent becomes invalid, and the reset information is passed to terminated or truncated. This means the user doesn't need to manually call env.reset() to reset the environment. Is this understanding correct? I will test this feature right away to see if it works, so I can continue developing the agent's exploration strategy. Thank you!

pascal-roth commented 1 week ago

Yes, exactly; we handle the reset internally.

Not sure why skrl would reset all environments once one terminates. I would recommend using rsl_rl, which does not include this behavior.

Toni-SM commented 2 days ago

Hi @ItsBean (and @pascal-roth)

Indeed, Isaac Lab (as well as other frameworks that implement parallel/vectorized environments, such as gym/gymnasium vectorized envs or Brax, for example) autoresets sub-environments after they terminate or truncated. This is a difference with respect to frameworks that only implement a single environment (such as gym/gymnasium non-vectorized (de default) or DeepMind, for example) where it is necessary to invoke the method to reset the environments “manually”.

Now, skrl is a generic RL library that not only support Isaac Lab but other RL frameworks, and is therefore able to handle both cases.

For Isaac Lab (and other frameworks that support parallel/vectorized envs), the .reset() method only "reset" the environment once. Subsequent calls just return the latest .step() observation/info as referenced in the next code:

https://github.com/Toni-SM/skrl/blob/441910440ffc53fa88256d0a9f34d60922c1eac5/skrl/envs/wrappers/torch/isaaclab_envs.py#L66-L75

    def reset(self) -> Tuple[torch.Tensor, Any]:
        """Reset the environment

        :return: Observation, info
        :rtype: torch.Tensor and any other info
        """
        if self._reset_once:
            self._observations, self._info = self._env.reset()
            self._reset_once = False
        return self._observations["policy"], self._info

Moreover, the sequential trainer (designed to train more than one agent simultaneously in parallel environments) calls the implementation of the base class (when there are no simultaneous agents to train, current Isaac Lab case). The base class implementation takes into account whether the environment is parallel/vectorized or not, to avoid calling the reset method (even if it is safe to invoke it) as referenced in the next code:

https://github.com/Toni-SM/skrl/blob/441910440ffc53fa88256d0a9f34d60922c1eac5/skrl/trainers/torch/base.py#L208-L216

            # reset environments
            if self.env.num_envs > 1:
                states = next_states
            else:
                if terminated.any() or truncated.any():
                    with torch.no_grad():
                        states, infos = self.env.reset()
                else:
                    states = next_states

Therefore, I strongly recommend that you continue to use skrl (and with it the improved implementation of the Model instantiators' Network definitions currently available in skrl-v1.3.0 and integrated in the upcoming official version of Isaac Lab).

isaac-sim / IsaacLab

[Question] HELP!!! Parallel environments do not support independent multi-run execution #973

Question

Expectation