Closed LukasSchaefer closed 1 month ago
Just poking around it seems they support a vector env https://gymnasium.farama.org/api/vector/ interface
maybe we can use this? or have 2 wrappers "gymnasium" and "gymnasium_vector"?
Hi Matteo,
Thanks for coming back quickly with comments. Some follow-ups so I best understand how this should look like:
vmas is currently depending on gym only for the specs. I think those are fine even if the library is unmaintained. I would like to avoid adding a core gymnasium dependency and keep the old specs. Gymnasium can be an optional dependency and its wrapper can handle the spec conversion
Currently, VMAS uses gym
for action/ observation spaces in its underlying environment. Would you like to keep this then and only "convert" those to Gymnasium spaces within the Gymnasium wrapper?
The only change i think we need in the vmas env interface is the terminated/truncated one, I would keep the rest as it was. The flag to get terminated and truncated instead of done can be called terminated_truncated instead of legacy_gym and be false by default
So you'd want the done
function to always return two values terminated
and truncated
? Similarly for the step
function and get_from_scenario
functions? Should those also return terminated
and truncated
instead of the previous done
? Also, in Gymnasium the reset
function returns both the observations and info dictionary. I'm asking since such changes in the interface might require changes in any code using VMAS atm which is why I originally was hesitant to do so. Very happy to go ahead with this though if that's your preference, I agree it's a cleaner solution than merging both interfaces within the environment function.
Just poking around it seems they support a vector env https://gymnasium.farama.org/api/vector/ interface
Yes, Gymnasium has wrappers to run multiple instances of environments in vectorised fashion, either synchronously or asynchronously. However, I think it might even be more efficient to write a vectorised gymnasium wrapper that uses a vectorised VMAS environment instance underneath and converts things to numpy arrays e.g. instead of having multiple gymnasium environments each of which holds a VMAS instance with a single environment only underneath. The latter would likely be notably slower. Let me know what you think and I'll have a go at this later today/ this week!
Currently, VMAS uses gym for action/ observation spaces in its underlying environment. Would you like to keep this then and only "convert" those to Gymnasium spaces within the Gymnasium wrapper?
exactly
So you'd want the done function to always return two values terminated and truncated? Similarly for the step function and get_from_scenario functions? Should those also return terminated and truncated instead of the previous done? Also, in Gymnasium the reset function returns both the observations and info dictionary. I'm asking since such changes in the interface might require changes in any code using VMAS atm which is why I originally was hesitant to do so. Very happy to go ahead with this though if that's your preference, I agree it's a cleaner solution than merging both interfaces within the environment function.
Nono, I would like to handle it like you have already done it in the PR. The only differences from the PR I am suggesting is that the legacy_gym
flag is renamed to terminated_truncated
(with swapped values ofc) and it affects just the way the dones are returned (no effect on specs, resets, rendering, and so on).
The way you implemented step
and get_from_scenario
is pristine and bc-compatible. I am just suggesting a renaming there.
Yes, Gymnasium has wrappers to run multiple instances of environments in vectorised fashion, either synchronously or asynchronously. However, I think it might even be more efficient to write a vectorised gymnasium wrapper that uses a vectorised VMAS environment instance underneath and converts things to numpy arrays e.g. instead of having multiple gymnasium environments each of which holds a VMAS instance with a single environment only underneath. The latter would likely be notably slower. Let me know what you think and I'll have a go at this later today/ this week!
Yes that is what i am referring to: wrap a vmas env (which has multiple subenvs) into a gymnasium vector and then just call .numpy()
on the tensors.
If we implement it as a vector of single vmas envs, I would personally resign from my PhD ahaahahah
The question here is if we should still have the single gymnasium env wrapper or not
So you'd want the done function to always return two values terminated and truncated? Similarly for the step function and get_from_scenario functions? Should those also return terminated and truncated instead of the previous done? Also, in Gymnasium the reset function returns both the observations and info dictionary. I'm asking since such changes in the interface might require changes in any code using VMAS atm which is why I originally was hesitant to do so. Very happy to go ahead with this though if that's your preference, I agree it's a cleaner solution than merging both interfaces within the environment function.
Just a further clarification on this. I would like excatly the opposite:
the only change in the vmas environment class is a flag (by default false) that allows to get terminated and truncated instead of done from the get_from_scenario
and step
. All the rest of the class should be unchanged as it already possesses the rest of the functionalities gymnasium users could desire
Gotcha, I think I understood what you mean 👍
I'll rename the environment argument flag as suggested. Just to make sure, for the base VMAS environment class, you'd want the only change induced by the flag to be the different in done
/ terminated
/ truncated
and revert the change in the reset
function interface (the new flag does not affect the reset
function which only returns observations unless other flags are specified in its arguments).
I'll also have a go at the Gymnasium wrapper. Since gymnasium separates singleton and vectorised environments, I'm tempted to keep these things separate here as well and have separate wrappers for a singleton Gymnasium and vectorised Gymnasium environments.
Gotcha, I think I understood what you mean 👍
I'll rename the environment argument flag as suggested. Just to make sure, for the base VMAS environment class, you'd want the only change induced by the flag to be the different in
done
/terminated
/truncated
and revert the change in thereset
function interface (the new flag does not affect thereset
function which only returns observations unless other flags are specified in its arguments).I'll also have a go at the Gymnasium wrapper. Since gymnasium separates singleton and vectorised environments, I'm tempted to keep these things separate here as well and have separate wrappers for a singleton Gymnasium and vectorised Gymnasium environments.
Exactly, and then the gymnasium wrapper can call reset with return_info=True
. (and can also keep self.render_mode
and other gymansium things)
All good on all fronts
cc @Giovannibriglia since we wanted to implement a StableBaselines3 wrapper, maybe the Gymnasium Vector we will work on here will make it easier to bootstrap the SB3 one
@matteobettini I pushed the updated integration including a vectorized Gymnasium wrapper. I tested things via the provided pytest
tests, made sure the BenchMARL integration still works and that shapes behave as anticipated.
Please let me know if there are any further changes you would like to see!
As a note, I slightly modified the make_env
function to pass through the return_numpy
and render_mode
flags I introduced in the Gymnasium wrappers. The latter is to comply with the standard Gymnasium render mode handling, and the former is to allow for returning torch tensors instead of numpy (but default is numpy to comply with standard environment interface of other Gymnasium envs). Alternatively, I could also pass through these arguments as part of the kwargs, but then they would also be fed through to the VMAS environment which might not be desirable. I considered this the cleanest the solution but happy to adjust if you think differently.
@matteobettini Added a new base VMAS wrapper class from which the gym, gymnasium, and vectorized gymnasium wrappers inherit that implements a lot of shared functionality including type conversions before and after feeding data to the environment.
Also made other smaller notifications as per your suggestions (gymnasium/ rllib import warnings, removing kwargs of wrappers and making wrapper kwargs optional to avoid mutable {}
default value)
@matteobettini Updated the base VMAS wrapper class for gym-style wrappers with simplified shared functions, and added unit tests for all gym-style wrappers now :)
Also if you could pls follow this for pre-cmmit chores https://github.com/proroklab/VectorizedMultiAgentSimulator/blob/main/CONTRIBUTING.md
for the tests we need to add gymansium and shimmy to install_dependencies.sh
I followed the pre-commit chore besides updating the sphinx documentation and updated according to your comments. I want to wrap this up soonish since I've already spent more time on it than I originally intended.
For the documentation, I am not familiar with sphinx. Should I just modify the docs/source/usage/*.rst
files directly within the repo or how should I proceed?
Ok all good! I'll take it on from here and do a few commits to doc as well as solve an import problem
I'll take it on from here and do a few commits to doc as well as solve an import problem
Sounds great, thanks for taking it over. And let me know if there is a bigger issue that I introduced and you'd want help with!
I think we should be good to go!
Last thing we need to sort out is that I added an mpe task to the tests and they are currently failing. Any idea of the cause?
Last thing we need to sort out is that I added an mpe task to the tests and they are currently failing. Any idea of the cause?
Just had a look and found the issue. When checking shapes in the case of dict spaces, I compared them in order of a list but the dict -> list conversion I did does not guarantee that those spaces are then in the right order so sometimes it would fail. I'm just writing a solution and will push in a bit
This commit seems to resolve the issue on my side. Please have a look but I think this should fix it!
Merged! Thanks a mil Lukas! I think that this will make a LOT of users happy. I owe you a beer when you come to Cambridge :)
I'm glad if it will turn out useful! I'll write a simple wrapper to integrate VMAS with its new gymnasium wrapper into EPyMARL, probably next week.
And I'll take you up on that offer once I'm properly moved!
I'm glad if it will turn out useful! I'll write a simple wrapper to integrate VMAS with its new gymnasium wrapper into EPyMARL, probably next week
Oh very cool! I'll make a release in the meantime then. Do you think we will be able to use the vector env in EPyMARL and keep the data on the torch device?
Do you think we will be able to use the vector env in EPyMARL and keep the data on the torch device?
As I see it right now, it might be tricky to use that unfortunately since the parallel rollouts in EPyMARL use multithreading with a different interface than standard vectorised environments (see EPyMARL's parallel runners for more info). To be able to use the vectorised environment of VMAS as an alternative to this parallelisation/ vectorisation, I'd need to fundamentally change data collection in EPyMARL but I don't think that's a change I'd like to do as of right now.
While this will cost some performance, I think the loss won't be too bad when using CPUs anyway but might be more noticeable when using a GPU since we won't be able to keep all data constantly on the GPU (it will be moved back and forth). To get the most out of the latter case though, a JAX framework might be better suited in the first place judging by the current landscape but that would be a different discussion altogether.
@matteobettini Just wanted to ping you that I added a VMAS wrapper to EPyMARL now! See the docs here that integrates all VMAS tasks with the gymnasium
wrapper into EPyMARL. As discussed, I only integrated the singleton environment for now but it already trains reasonably fast.
I tried only MAPPO training in balance and transport and with just 4 CPU cores (no GPUs), it took 6 1/2h to train 10M timesteps in transport and 10h in balance which seems reasonable, even though I'm sure it could be notably faster when using and keeping all on the GPU.
Amazing! This will be so helpful as now users have the epymarl/benchmarl/rllib triad to triple check their results when in doubt. I love this so much.
PS If one day you will want to add a VectorRunner
or smth like that that steps batched enviornments I would be happy to help.
Thanks for the offer! I'll ping you should I spend some time on this, even though I have to admit that it's unlikely I'll work on this (in the near future) given I start a new job soon.
The current VMAS implementation supports the OpenAI Gym interface but not the new and still maintained Gymnasium interface. This was already raised in an issue #61 before.
This PR adds both a wrapper that implements the Gymnasium interface for VMAS, and native Gymnasium interface for the VMAS environment via the
legacy_gym=False
argument. By default, the default and previous Gym interface is maintained for backwards compatibility.Small quality of life function to allow the
make_env
function to receive the wrapper name (Gymnasium
,Gym
,RLLib
) as a string argument instead of a wrapper object only.I have tested the interactive environment interface and ensured that (by default with
legacy_gym=True
) VMAS training of BenchMARL still runs as documented.I'm happy to do any further changes as requested to make sure all works fine so let me know if you have any feedback!
fixes #61
bc-breaking changes:
env.unwrapped()
->env.unwrapped
in gym wrapper