[Feature] Terminated/truncated support and Gymnasium wrapper

LukasSchaefer commented 1 month ago

The current VMAS implementation supports the OpenAI Gym interface but not the new and still maintained Gymnasium interface. This was already raised in an issue #61 before.

This PR adds both a wrapper that implements the Gymnasium interface for VMAS, and native Gymnasium interface for the VMAS environment via the legacy_gym=False argument. By default, the default and previous Gym interface is maintained for backwards compatibility.

Small quality of life function to allow the make_env function to receive the wrapper name (Gymnasium, Gym, RLLib) as a string argument instead of a wrapper object only.

I have tested the interactive environment interface and ensured that (by default with legacy_gym=True) VMAS training of BenchMARL still runs as documented.

I'm happy to do any further changes as requested to make sure all works fine so let me know if you have any feedback!

fixes #61

bc-breaking changes: env.unwrapped() -> env.unwrapped in gym wrapper

matteobettini commented 1 month ago

Just poking around it seems they support a vector env https://gymnasium.farama.org/api/vector/ interface

maybe we can use this? or have 2 wrappers "gymnasium" and "gymnasium_vector"?

LukasSchaefer commented 1 month ago

Hi Matteo,

Thanks for coming back quickly with comments. Some follow-ups so I best understand how this should look like:

vmas is currently depending on gym only for the specs. I think those are fine even if the library is unmaintained. I would like to avoid adding a core gymnasium dependency and keep the old specs. Gymnasium can be an optional dependency and its wrapper can handle the spec conversion

Currently, VMAS uses gym for action/ observation spaces in its underlying environment. Would you like to keep this then and only "convert" those to Gymnasium spaces within the Gymnasium wrapper?

The only change i think we need in the vmas env interface is the terminated/truncated one, I would keep the rest as it was. The flag to get terminated and truncated instead of done can be called terminated_truncated instead of legacy_gym and be false by default

So you'd want the done function to always return two values terminated and truncated? Similarly for the step function and get_from_scenario functions? Should those also return terminated and truncated instead of the previous done? Also, in Gymnasium the reset function returns both the observations and info dictionary. I'm asking since such changes in the interface might require changes in any code using VMAS atm which is why I originally was hesitant to do so. Very happy to go ahead with this though if that's your preference, I agree it's a cleaner solution than merging both interfaces within the environment function.

Just poking around it seems they support a vector env https://gymnasium.farama.org/api/vector/ interface

Yes, Gymnasium has wrappers to run multiple instances of environments in vectorised fashion, either synchronously or asynchronously. However, I think it might even be more efficient to write a vectorised gymnasium wrapper that uses a vectorised VMAS environment instance underneath and converts things to numpy arrays e.g. instead of having multiple gymnasium environments each of which holds a VMAS instance with a single environment only underneath. The latter would likely be notably slower. Let me know what you think and I'll have a go at this later today/ this week!

matteobettini commented 1 month ago

Currently, VMAS uses gym for action/ observation spaces in its underlying environment. Would you like to keep this then and only "convert" those to Gymnasium spaces within the Gymnasium wrapper?

exactly

So you'd want the done function to always return two values terminated and truncated? Similarly for the step function and get_from_scenario functions? Should those also return terminated and truncated instead of the previous done? Also, in Gymnasium the reset function returns both the observations and info dictionary. I'm asking since such changes in the interface might require changes in any code using VMAS atm which is why I originally was hesitant to do so. Very happy to go ahead with this though if that's your preference, I agree it's a cleaner solution than merging both interfaces within the environment function.

Nono, I would like to handle it like you have already done it in the PR. The only differences from the PR I am suggesting is that the legacy_gym flag is renamed to terminated_truncated (with swapped values ofc) and it affects just the way the dones are returned (no effect on specs, resets, rendering, and so on).

The way you implemented step and get_from_scenario is pristine and bc-compatible. I am just suggesting a renaming there.

Yes, Gymnasium has wrappers to run multiple instances of environments in vectorised fashion, either synchronously or asynchronously. However, I think it might even be more efficient to write a vectorised gymnasium wrapper that uses a vectorised VMAS environment instance underneath and converts things to numpy arrays e.g. instead of having multiple gymnasium environments each of which holds a VMAS instance with a single environment only underneath. The latter would likely be notably slower. Let me know what you think and I'll have a go at this later today/ this week!

Yes that is what i am referring to: wrap a vmas env (which has multiple subenvs) into a gymnasium vector and then just call .numpy() on the tensors.

If we implement it as a vector of single vmas envs, I would personally resign from my PhD ahaahahah

The question here is if we should still have the single gymnasium env wrapper or not

matteobettini commented 1 month ago

So you'd want the done function to always return two values terminated and truncated? Similarly for the step function and get_from_scenario functions? Should those also return terminated and truncated instead of the previous done? Also, in Gymnasium the reset function returns both the observations and info dictionary. I'm asking since such changes in the interface might require changes in any code using VMAS atm which is why I originally was hesitant to do so. Very happy to go ahead with this though if that's your preference, I agree it's a cleaner solution than merging both interfaces within the environment function.

Just a further clarification on this. I would like excatly the opposite: the only change in the vmas environment class is a flag (by default false) that allows to get terminated and truncated instead of done from the get_from_scenario and step. All the rest of the class should be unchanged as it already possesses the rest of the functionalities gymnasium users could desire

LukasSchaefer commented 1 month ago

Gotcha, I think I understood what you mean 👍

I'll rename the environment argument flag as suggested. Just to make sure, for the base VMAS environment class, you'd want the only change induced by the flag to be the different in done/ terminated/ truncated and revert the change in the reset function interface (the new flag does not affect the reset function which only returns observations unless other flags are specified in its arguments).

I'll also have a go at the Gymnasium wrapper. Since gymnasium separates singleton and vectorised environments, I'm tempted to keep these things separate here as well and have separate wrappers for a singleton Gymnasium and vectorised Gymnasium environments.

matteobettini commented 1 month ago

Gotcha, I think I understood what you mean 👍

I'll rename the environment argument flag as suggested. Just to make sure, for the base VMAS environment class, you'd want the only change induced by the flag to be the different in done/ terminated/ truncated and revert the change in the reset function interface (the new flag does not affect the reset function which only returns observations unless other flags are specified in its arguments).

I'll also have a go at the Gymnasium wrapper. Since gymnasium separates singleton and vectorised environments, I'm tempted to keep these things separate here as well and have separate wrappers for a singleton Gymnasium and vectorised Gymnasium environments.

Exactly, and then the gymnasium wrapper can call reset with return_info=True. (and can also keep self.render_mode and other gymansium things)

All good on all fronts

matteobettini commented 1 month ago

cc @Giovannibriglia since we wanted to implement a StableBaselines3 wrapper, maybe the Gymnasium Vector we will work on here will make it easier to bootstrap the SB3 one

LukasSchaefer commented 1 month ago

@matteobettini I pushed the updated integration including a vectorized Gymnasium wrapper. I tested things via the provided pytest tests, made sure the BenchMARL integration still works and that shapes behave as anticipated.

Please let me know if there are any further changes you would like to see!

As a note, I slightly modified the make_env function to pass through the return_numpy and render_mode flags I introduced in the Gymnasium wrappers. The latter is to comply with the standard Gymnasium render mode handling, and the former is to allow for returning torch tensors instead of numpy (but default is numpy to comply with standard environment interface of other Gymnasium envs). Alternatively, I could also pass through these arguments as part of the kwargs, but then they would also be fed through to the VMAS environment which might not be desirable. I considered this the cleanest the solution but happy to adjust if you think differently.

LukasSchaefer commented 1 month ago

@matteobettini Added a new base VMAS wrapper class from which the gym, gymnasium, and vectorized gymnasium wrappers inherit that implements a lot of shared functionality including type conversions before and after feeding data to the environment.

Also made other smaller notifications as per your suggestions (gymnasium/ rllib import warnings, removing kwargs of wrappers and making wrapper kwargs optional to avoid mutable {} default value)

LukasSchaefer commented 1 month ago

@matteobettini Updated the base VMAS wrapper class for gym-style wrappers with simplified shared functions, and added unit tests for all gym-style wrappers now :)

matteobettini commented 1 month ago

Also if you could pls follow this for pre-cmmit chores https://github.com/proroklab/VectorizedMultiAgentSimulator/blob/main/CONTRIBUTING.md

matteobettini commented 1 month ago

for the tests we need to add gymansium and shimmy to install_dependencies.sh

LukasSchaefer commented 1 month ago

I followed the pre-commit chore besides updating the sphinx documentation and updated according to your comments. I want to wrap this up soonish since I've already spent more time on it than I originally intended.

For the documentation, I am not familiar with sphinx. Should I just modify the docs/source/usage/*.rst files directly within the repo or how should I proceed?

matteobettini commented 1 month ago

Ok all good! I'll take it on from here and do a few commits to doc as well as solve an import problem

LukasSchaefer commented 1 month ago

I'll take it on from here and do a few commits to doc as well as solve an import problem

Sounds great, thanks for taking it over. And let me know if there is a bigger issue that I introduced and you'd want help with!

matteobettini commented 1 month ago

I think we should be good to go!

Last thing we need to sort out is that I added an mpe task to the tests and they are currently failing. Any idea of the cause?

LukasSchaefer commented 1 month ago

Last thing we need to sort out is that I added an mpe task to the tests and they are currently failing. Any idea of the cause?

Just had a look and found the issue. When checking shapes in the case of dict spaces, I compared them in order of a list but the dict -> list conversion I did does not guarantee that those spaces are then in the right order so sometimes it would fail. I'm just writing a solution and will push in a bit

LukasSchaefer commented 1 month ago

This commit seems to resolve the issue on my side. Please have a look but I think this should fix it!

matteobettini commented 1 month ago

Merged! Thanks a mil Lukas! I think that this will make a LOT of users happy. I owe you a beer when you come to Cambridge :)

LukasSchaefer commented 1 month ago

I'm glad if it will turn out useful! I'll write a simple wrapper to integrate VMAS with its new gymnasium wrapper into EPyMARL, probably next week.

And I'll take you up on that offer once I'm properly moved!

matteobettini commented 1 month ago

I'm glad if it will turn out useful! I'll write a simple wrapper to integrate VMAS with its new gymnasium wrapper into EPyMARL, probably next week

Oh very cool! I'll make a release in the meantime then. Do you think we will be able to use the vector env in EPyMARL and keep the data on the torch device?

LukasSchaefer commented 1 month ago

Do you think we will be able to use the vector env in EPyMARL and keep the data on the torch device?

As I see it right now, it might be tricky to use that unfortunately since the parallel rollouts in EPyMARL use multithreading with a different interface than standard vectorised environments (see EPyMARL's parallel runners for more info). To be able to use the vectorised environment of VMAS as an alternative to this parallelisation/ vectorisation, I'd need to fundamentally change data collection in EPyMARL but I don't think that's a change I'd like to do as of right now.

While this will cost some performance, I think the loss won't be too bad when using CPUs anyway but might be more noticeable when using a GPU since we won't be able to keep all data constantly on the GPU (it will be moved back and forth). To get the most out of the latter case though, a JAX framework might be better suited in the first place judging by the current landscape but that would be a different discussion altogether.

LukasSchaefer commented 1 month ago

@matteobettini Just wanted to ping you that I added a VMAS wrapper to EPyMARL now! See the docs here that integrates all VMAS tasks with the gymnasium wrapper into EPyMARL. As discussed, I only integrated the singleton environment for now but it already trains reasonably fast.

I tried only MAPPO training in balance and transport and with just 4 CPU cores (no GPUs), it took 6 1/2h to train 10M timesteps in transport and 10h in balance which seems reasonable, even though I'm sure it could be notably faster when using and keeping all on the GPU.

matteobettini commented 1 month ago

Amazing! This will be so helpful as now users have the epymarl/benchmarl/rllib triad to triple check their results when in doubt. I love this so much.

PS If one day you will want to add a VectorRunner or smth like that that steps batched enviornments I would be happy to help.

LukasSchaefer commented 1 month ago

Thanks for the offer! I'll ping you should I spend some time on this, even though I have to admit that it's unlikely I'll work on this (in the near future) given I start a new job soon.

proroklab / VectorizedMultiAgentSimulator

[Feature] Terminated/truncated support and Gymnasium wrapper #143