Closed ludc closed 3 years ago
You're welcome! Please see https://github.com/google/brax/blob/main/brax/training/env.py#L57 for how we reset environments with varying lengths, by testing for done.
This functionality is a bit hidden, though, and could probably live in the Env itself instead of within a separate wrapper. We are looking into the feasibility of this. If so, then you could imagine the Env
would take a flag autoreset
or some such.
By the way @erikfrey , I think there was already an issue open at some point about how this behaviour relates to that of the gym.vector.VectorEnv
class, which does the autoreset and discards the final observation.
Would you mind pointing me to that issue, or clarifying if this is also what is done in Brax's vectorized envs?
@lebrice do you mean #43 ? Yes, just addressed that too.
We now support this behavior (resetting batched environments automatically) by default. If you're curious how it works, see AutoResetWrapper
in https://github.com/google/brax/blob/main/brax/envs/wrappers.py
If you'd like the old behavior (which is to not auto-reset), you can still get it by calling envs.create(..., auto_reset=False)
Hi, and thanks for this amazing work.
I am wondering if there is a way to automatically reset an environment when reaching a terminal state when executing a batch of environments. It is unclear to me how to do this with BRAX when episodes can be of varying lengths.