autoreset batch environments when done=True

google / brax

Massively parallel rigidbody physics simulation on accelerator hardware.

Apache License 2.0

2.34k stars 255 forks source link

autoreset batch environments when done=True #35

Closed ludc closed 3 years ago

ludc commented 3 years ago

Hi, and thanks for this amazing work.

I am wondering if there is a way to automatically reset an environment when reaching a terminal state when executing a batch of environments. It is unclear to me how to do this with BRAX when episodes can be of varying lengths.

erikfrey commented 3 years ago

You're welcome! Please see https://github.com/google/brax/blob/main/brax/training/env.py#L57 for how we reset environments with varying lengths, by testing for done.

This functionality is a bit hidden, though, and could probably live in the Env itself instead of within a separate wrapper. We are looking into the feasibility of this. If so, then you could imagine the Env would take a flag autoreset or some such.

lebrice commented 3 years ago

By the way @erikfrey , I think there was already an issue open at some point about how this behaviour relates to that of the gym.vector.VectorEnv class, which does the autoreset and discards the final observation. Would you mind pointing me to that issue, or clarifying if this is also what is done in Brax's vectorized envs?

erikfrey commented 3 years ago

@lebrice do you mean #43 ? Yes, just addressed that too.

We now support this behavior (resetting batched environments automatically) by default. If you're curious how it works, see AutoResetWrapper in https://github.com/google/brax/blob/main/brax/envs/wrappers.py

If you'd like the old behavior (which is to not auto-reset), you can still get it by calling envs.create(..., auto_reset=False)