Closed ItsBean closed 2 days ago
Hi, Each environment is reset automatically depending on the event functions you passed into the configuration. Looking into the step function, it is visible that all the environments are already getting reset when they terminate or run out of time. So your reset call is not necessary. The returned observations are already from the resetted environments.
We are using and testing this behavior and cannot see any effects on other environments. Can you please detail and provide a minimal script where you observe this behavior?
From the example code source/standalone/workflows/skrl/train.py
, we can observe that in parallel environments, there is no mechanism where each environment is treated separately. The example code uses if terminated.any() or truncated.any():
as the reset condition, meaning that when one of the environments is done, all environments are reset. This can potentially impact the agent's exploration.
Additionally, I am not allowed to start multiple instances of Isaac Sim to run 10 environments. This is the puzzle that confuses me.
Here is the code defined in from skrl.trainers.torch import SequentialTrainer
:
# reset env
states, infos = self.env.reset()
for timestep in tqdm.tqdm(range(self.initial_timestep, self.timesteps), disable=self.disable_progressbar, file=sys.stdout):
# pre-interaction
for agent in self.agents:
agent.pre_interaction(timestep=timestep, timesteps=self.timesteps)
# compute actions
with torch.no_grad():
actions = torch.vstack([agent.act(states[scope[0]:scope[1]], timestep=timestep, timesteps=self.timesteps)[0] \
for agent, scope in zip(self.agents, self.agents_scope)])
# step the environments
next_states, rewards, terminated, truncated, infos = self.env.step(actions)
# render scene
if not self.headless:
self.env.render()
# record the environments' transitions
for agent, scope in zip(self.agents, self.agents_scope):
agent.record_transition(states=states[scope[0]:scope[1]],
actions=actions[scope[0]:scope[1]],
rewards=rewards[scope[0]:scope[1]],
next_states=next_states[scope[0]:scope[1]],
terminated=terminated[scope[0]:scope[1]],
truncated=truncated[scope[0]:scope[1]],
infos=infos,
timestep=timestep,
timesteps=self.timesteps)
# log environment info
if self.environment_info in infos:
for k, v in infos[self.environment_info].items():
if isinstance(v, torch.Tensor) and v.numel() == 1:
for agent in self.agents:
agent.track_data(f"Info / {k}", v.item())
# post-interaction
for agent in self.agents:
agent.post_interaction(timestep=timestep, timesteps=self.timesteps)
# reset environments
with torch.no_grad():
if terminated.any() or truncated.any():
states, infos = self.env.reset()
else:
states = next_states
If revising the environment configuration is the only solution to achieve independent environment resets, are there any tutorials or template code available to help modify the default environment configuration?
# -- reset envs that terminated/timed-out and log the episode information
reset_env_ids = self.reset_buf.nonzero(as_tuple=False).squeeze(-1)
if len(reset_env_ids) > 0:
self._reset_idx(reset_env_ids)
Based on this code, we can infer that the environment will reset automatically when the agent becomes invalid, and the reset information is passed to terminated
or truncated
. This means the user doesn't need to manually call env.reset()
to reset the environment. Is this understanding correct? I will test this feature right away to see if it works, so I can continue developing the agent's exploration strategy. Thank you!
Yes, exactly; we handle the reset internally.
Not sure why skrl would reset all environments once one terminates. I would recommend using rsl_rl, which does not include this behavior.
Hi @ItsBean (and @pascal-roth)
Indeed, Isaac Lab (as well as other frameworks that implement parallel/vectorized environments, such as gym/gymnasium vectorized envs or Brax, for example) autoresets sub-environments after they terminate or truncated. This is a difference with respect to frameworks that only implement a single environment (such as gym/gymnasium non-vectorized (de default) or DeepMind, for example) where it is necessary to invoke the method to reset the environments “manually”.
Now, skrl
is a generic RL library that not only support Isaac Lab but other RL frameworks, and is therefore able to handle both cases.
For Isaac Lab (and other frameworks that support parallel/vectorized envs), the .reset()
method only "reset" the environment once. Subsequent calls just return the latest .step()
observation/info as referenced in the next code:
def reset(self) -> Tuple[torch.Tensor, Any]:
"""Reset the environment
:return: Observation, info
:rtype: torch.Tensor and any other info
"""
if self._reset_once:
self._observations, self._info = self._env.reset()
self._reset_once = False
return self._observations["policy"], self._info
Moreover, the sequential trainer (designed to train more than one agent simultaneously in parallel environments) calls the implementation of the base class (when there are no simultaneous agents to train, current Isaac Lab case). The base class implementation takes into account whether the environment is parallel/vectorized or not, to avoid calling the reset method (even if it is safe to invoke it) as referenced in the next code:
# reset environments
if self.env.num_envs > 1:
states = next_states
else:
if terminated.any() or truncated.any():
with torch.no_grad():
states, infos = self.env.reset()
else:
states = next_states
Therefore, I strongly recommend that you continue to use skrl
(and with it the improved implementation of the Model instantiators' Network definitions currently available in skrl-v1.3.0 and integrated in the upcoming official version of Isaac Lab).
Question
I am working with parallel environments in Isaac Sim and encountered an issue where the state of one environment affects the others during execution. Specifically, when one environment reaches a
done
ortruncated
state, I attempt to reset only that environment, but this results in unintended behavior where the remaining environments are impacted.Here is the simplified code structure I'm using:
env.reset(env_id=i)
is not supported. So Isaac Sim currently doesn't support the functionality where environments can run independently without affecting each other. When one environment finishes (reaches done or truncated), it is not possible to reset only that specific environment without resetting or influencing the others.Expectation
I would like a mechanism where each environment can be managed and reset independently, such that the progression of one environment does not impact the others.