openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.85k stars 8.61k forks source link

Update on Plans for the MuJoCo, Robotics and Box2d Environments and the Status of Brax and Hardware Accelerated Environments in Gym #2456

Closed jkterry1 closed 2 years ago

jkterry1 commented 3 years ago

Given DeepMinds acquisition of MuJoCo and past discussions about replacing MuJoCo environments in Gym, I would like to clarify plans going forward after meeting with the Brax/PyBullet/TDS team at Google and the MuJoCo team at DeepMind.

1) We are going to be replacing the documented MuJoCo environments of the "MuJoCo" class with Brax based environments in the "Phys3D" class, add a deprecation warning to the "MuJoCo" environments and move them to a separate deprecated repo some months later. This raises several questions -"Why do the MuJoCo environments have to be replaced?" Despite MuJoCo being free, right now, the Gym environments have numerous bugs in simulation configuration and have code in a state that we are not able to maintain them. Moreover, they all depend on MuJoCo-Py, which is now fully deprecated and cannot be reasonably maintained. Given this, to use the environments with the more updated free versions of MuJoCo, to fix bugs and to be able to continue do basic maintenance like using new Python versions, the environments would have to be very nearly rewritten from scratch. This means that a serious discussion of a change of simulator is appropriate. -"Of all the simulators available, why Brax?" First lets list the possible widely used options: PyBullet, MuJoCo, TDS and Brax. PyBullet, which was originally is the obvious choice, no longer seriously maintained in favor of TDS and Brax. Each simulators have pros and cons. TDS has full differentiability, Brax has accelerator support (the environments run on GPUs or TPUs allowing training to go orders of magnitude faster- e.g. full training in minutes), and PyBullet and MuJoCo are more physically accurate. For the "MuJoCo" environment class, this high level of physical accuracy is not necessary. Accordingly, picking newer simulators with extra feature of use to researchers (differentiability or hardware acceleration support) is likely the preferable option. I personally believe that hardware accelerator support is more important, hence choosing Brax. -"How long will this take?" We hope to have a release with the Brax based Phys3D environments within the next 5 weeks and a lot of progress has already been made, but it a definite date is difficult to say. For the most recent updates, see https://github.com/google/brax/issues/49

2) The "Robotics" environments are being moved out of Gym. This in turn raises several questions: -"Why can't they be maintained as is?" These environments have the same problems with being unmaintainable and having serious bugs as the others in the "MuJoCo" class with hopper and so on do. -"Why can't these be rewritten in Brax like the others?" Brax not physically accurate enough to support such complex simulations well, and while they hope to support this in the future it will take a very long time. -"I use the Robotics environments, were are they going?" ~Into a repo maintained by @Rohan138 , unless someone who is capable of maintaining them to a higher level and wants to reaches out to me. They will still be maintained as best as is reasonably possible in their state, be installable, and be listed as third party environments in Gym.~ https://github.com/Farama-Foundation/gym-robotics -"Shouldn't Gym have robotics environments like this though? Why not rewrite them in a manner that's suitable?" Because I don't think Gym inherently should have them and because we can't. My goal is to make all the environments inside Gym itself good general purpose benchmarks suitable that someone new to the field of reinforcement learning can look at and say "okay, there's are the things everyone uses that I should play with and understand." The robotics environments after many years have never filled this role and have become niche environments specifically for HER and similar research, and while I cannot speak personally to this matter, the robotics researchers I've spoken to say that these environments are no longer widely used in this space, and that forks of them are used instead, which further means these should not live in Gym proper. Regarding why we can't, these would literally have to be rewritten in the new version of MuJoCo (as Pybullet is no longer extensively maintained) and it's new coming Python bindings (which will not be released publicly for many months, likely with Python bindings following later), and that's not something anyone I'm aware of is willing to do due to the utterly extraordinary amount of work required, including the MuJoCo team at Deepmind. -"When will this happen?" Whenever the next release of Gym comes out.

3) The Box2D environments will be rewritten in Brax in a new Phys2D environment class, and the Box2D environments will be deprecated and then removed, similar to the MuJoCo environments. In this process, the duplicate versions of lunar lander and bipedal walker will be consolidated into one environment, with the different modes as arguments on creation. To answer the natural questions about this here as well: -"Why do they need to be rewritten?" This is discussed in https://github.com/openai/gym/issues/2358, but in a nutshell the physics engine they're using using (Box2D) has Python bindings that have not been maintained for years, meaning that they'll stop supporting new Python versions, architectures, and other basic maintenance things. After many discussions over months, I cannot get these bindings maintained by basically anyone. Additionally, using pyglet for rendering has been a source of continual problems for Gym and it does not reasonably support headless rendering (an essential feature). -"Why Brax?" Originally I was planning to use the other major 2D physics library (chipmunk, which has well maintained Python bindings), but Brax is orders of magnitude faster as it can run or accelerators and the Brax team is kind enough to be willing to do the replacements for us. -"When will this happen?" Probably a month after the Phys3D environments are merged at the current rate, but that's not a timeline people have committed to or anything.

4) General questions: -"These Brax environments can still run on my CPU too, right?" Yep! -"Can Brax environments run on AMD GPUs?" With some effort, yes. Brax uses Jax, which uses XLA, which has solid beta support for most AMD GPUs. -"Why are you having Gym so heavily depend on Brax?" Because I think that it's the best option for environments that already need to be rewritten, and because I think that letting the benchmark environments run orders of magnitude faster via accelerators is of profound value to the community and to beginners in the field. -"Is Brax going to be maintained for the long term?" As long as we can realistically expect, yes. All software stands risk of deprecation, e.g. PyBullet, the Box2D Python bindings (and arguably Box2D itself), PIL (what came before pillow), and so on. Given what I've seen that Google is using it for internally, I'm very confident it will be maintained for at least 5 years or so if not longer, which I think is the best we can reasonably plan for. -"Are you going to make other environments hardware accelerated so they can similarly run orders of magnitude faster?" Hopefully! This could be done with the toy text environments and the classic control environments pretty easily through Jax. I have no concrete plans or timeline for this.

Please let me know if anyone has additional questions about the transition here.

araffin commented 3 years ago

Reposting comment from TyPh00nCdrCool on reddit which perfectly translates my vision in this plan:

" Some thoughts: Imo this is quite a leap of faith you're taking here. You're rejecting the stable options (PyBullet, MuJoCo) in favor of newer and "fancier" simulators (which obviously will receive more commits as they're less stable and easier to work on). It's quite a stretch to call that state "more maintained". There are no guarantees Brax will ever reach the maturity of PyBullet or MuJoCo. And given Googles track record it might as well be abandoned and deprecated in the not so distant future before it reaches a stable state. Especially given they're essentially working on their own competition (TDS, also a Google project) simultaneously.

It feels as though the future is kind of in a limbo right now: without PyBullet's or MuJuCo's stability and physical accuracy, banking big time on Brax being maintained long-term while still being rather new and inaccurate, and lastly completely disregarding physically accurate simulations (moving it away from the project and handing it over to be maintained by a single student).

I do, however, see the benefits you're hoping for with this change! I just don't see them quite as set in stone as you make them out to be. Which is why I don't want to be overly critical but still calling it at least a leap of faith. "

PS: you should probably say that TDS stands for tiny-differentiable-simulator as it is not obvious...

jkterry1 commented 3 years ago

I have a few responses to some of those points for those following this thread:

"You're rejecting the stable options (PyBullet, MuJoCo) in favor of newer and "fancier" simulators." -I disagree. Bullet is actively not being maintained for basic bug fix PRs, new Python versions, etc. It's pretty much abandoned; I've spoke to the former maintainers. While we could use MuJoCo, it would require a complete rewrite of the environments against a version of MuJoCo that does not exist and does not yet have a well defined release date. This all puts us in a no-win situation regarding stable options

"And given Googles track record it might as well be abandoned and deprecated in the not so distant future before it reaches a stable state. Especially given they're essentially working on their own competition (TDS, also a Google project) simultaneously." -Like I said, I've spoken over Zoom to every team at issue here (TDS/PyBullet/MuJoCo/Brax). For better or worse, every single one of these is a Google product, and carries those risks. My impressions after detailed discussions with the teams is that I think that Brax is good for another 5+ years like I said. If I were to personally bet having met with the teams here, I'd also actually bet that Brax has the lowest probability of being abandoned of the four.

I'll also add that, for those who have commented with concerns about the physical accuracy of Brax in various places, the MuJoCo team (who is constantly meeting with the Brax team right now) and former Bullet team also thought that Brax was a perfectly adequate replacement for the MuJoCo class environments (not the robotics class ones).

tristandeleu commented 3 years ago

You say that Brax support will come with the next release of Gym, but does that mean that the environments will live in the Gym repository? If they are, then why not rely on the plugin system, and leave the maintenance to the more competent hands of the Brax team (similar to ALE-py)?

jkterry1 commented 3 years ago

The current plan regarding having the Brax environments in Gym itself is ultimately because the Brax maintainers want the envs to live in Gym, with them maintaining them inside of Gym. I see no major problem doing this personally. This is arguable fairly different scenario from the ALE, because the Brax environments are completely modular from the underlying library, while this is not at all the case with the ALE.

tristandeleu commented 3 years ago

Sorry to ask (since these discussions were private, and there is no mention of the plugin system in the OP), but was the Brax team given the option of providing their environments as a plugin at all?

Beyond the advantage of leaving full control over the development and maintenance of the environments to the Brax team, I can also see additional advantages of doing it via the plugin system:

ericjang commented 3 years ago

Drive-by question: my suspicion is that existing RL algorithms will perform differently on Brax vs. Mujoco, and possibly lead to contradictory results (e.g. algorithm A performs better than algorithm B on Brax, but the difference vanishes on Mujoco). How does the community plan to compare learning results on Brax-rewritten control envs vs. prior published work on Mujoco?

(same goes for Pybullet).

araffin commented 3 years ago

Drive-by question: my suspicion is that existing RL algorithms will perform differently on Brax vs. Mujoco, and possibly lead to contradictory results (e.g. algorithm A performs better than algorithm B on Brax, but the difference vanishes on Mujoco). How does the community plan to compare learning results on Brax-rewritten control envs vs. prior published work on Mujoco?

(same goes for Pybullet).

For PyBullet, there is actually a benchmark of A2C, PPO, SAC and TD3 in that paper (with tuned hyperparameters): https://arxiv.org/abs/2005.05719 (table was moved to appendix in v2) and results can be easily reproduced in the rl zoo: https://github.com/DLR-RM/rl-baselines3-zoo

https://paperswithcode.com/paper/generalized-state-dependent-exploration-for

tkelestemur commented 3 years ago

-"Why do the MuJoCo environments have to be replaced?" Despite MuJoCo being free, right now, the Gym environments have numerous bugs in simulation configuration and have code in a state that we are not able to maintain them. Moreover, they all depend on MuJoCo-Py, which is now fully deprecated and cannot be reasonably maintained. Given this, to use the environments with the more updated free versions of MuJoCo, to fix bugs and to be able to continue do basic maintenance like using new Python versions, the environments would have to be very nearly rewritten from scratch. This means that a serious discussion of a change of simulator is appropriate.

Regarding this, have you considered using Deepmind's MuJoCo wrapper? Given that DeepMind acquired MuJoCo and the history of the dm_control library, I'm assuming this wrapper will be maintained actively going forward. It'll be a lot easier to change the MuJoCo wrapper than to switch to an entirely new simulator.

araffin commented 3 years ago

You're rejecting the stable options (PyBullet, MuJoCo) in favor of newer and "fancier" simulators." -I disagree. Bullet is actively not being maintained for basic bug fix PRs, new Python versions, etc. It's pretty much abandoned; I've spoke to the former maintainers. While we could use MuJoCo, it would require a complete rewrite

I think you are confusing stable with maintained. Stable means that it has been there for long enough (and tested by many) that only minor bugs may be present. Maintained means that new features may be added, questions answered and bug fixed. Pybullet should work with python 3.9 (and maybe 3.10) which ensures that everything will work for at least 4 years (which is more or less the target you set for Brax). And also Pybullet dev has slow down, it is still active (https://github.com/bulletphysics/bullet3/commits/master/examples/pybullet) and I'm pretty sure @erwincoumans would merge PR in the future if they fix bugs.

Regarding Mujoco, even if mujoco py is no longer maintained, nothing prevent people from using @nimrod-gileadi fork that is compatible with the latest and free Mujoco version (https://github.com/openai/mujoco-py/pull/640) which allow continuing to use existing environments and compare to previous works.

AGPX commented 3 years ago

Let me add just one consideration. For months now, I have been using a software (DeepMimic) to try to train a neural network to perform certain behaviors. DeepMimic is based on tensorflow and Bullet and runs totally on the CPU (simulation and training) and despite the parallelization (MPI), the training times are biblical (tested on Ryzen 9 5950X, 16 cores, with tf compiled with AVX2) and given the huge amount of tuning (not only hyper parameters, but also the reward functions) that needs to be done to get a decent result, having the ability to use a physics engine running on the GPU is IMHO absolutely crucial for this type of application.

Rohan138 commented 3 years ago

Drive-by question: my suspicion is that existing RL algorithms will perform differently on Brax vs. Mujoco, and possibly lead to contradictory results (e.g. algorithm A performs better than algorithm B on Brax, but the difference vanishes on Mujoco). How does the community plan to compare learning results on Brax-rewritten control envs vs. prior published work on Mujoco?

I think you are confusing stable with maintained. Stable means that it has been there for long enough (and tested by many) that only minor bugs may be present. Maintained means that new features may be added, questions answered and bug fixed. Pybullet should work with python 3.9 (and maybe 3.10) which ensures that everything will work for at least 4 years (which is more or less the target you set for Brax). And also Pybullet dev has slow down, it is still active (https://github.com/bulletphysics/bullet3/commits/master/examples/pybullet) and I'm pretty sure @erwincoumans would merge PR in the future if they fix bugs.

While this is true, the PyBullet environments, as well as reimplementations of the gym[mujoco] environments in PyBullet, also have significant differences in the observations from MuJoCo. For example, see the discussions here and here. Per conversations with @erwincomans and @benelot, these differences are major enough that pybullet results aren't comparable to prior work using mujoco either. Per @benelot, the creator of pybullet-gym, these differences were significant enough that he was unable to resolve all of them.

On the other hand, the Brax observations have been/are being modified to reproduce those of mujoco as closely as possible, which means previous results on mujoco might actually be reproducible on Brax. Of course, Brax is not as physically accurate as mujoco or pybullet, but as per conversations with the pybullet/brax/TDS team, this should not be a major factor in either the choice of simulator or the reproducibility (for the mujoco environments only, not the robotics ones).

Regarding Mujoco, even if mujoco py is no longer maintained, nothing prevent people from using @nimrod-gileadi fork that is compatible with the latest and free Mujoco version (openai/mujoco-py#640) which allow continuing to use existing environments and compare to previous works.

True, but this would also be true even if the mujoco environments aren't located in gym itself. mujoco-py currently has 288 open issues and 17 pull requests, and accounts for 67 of the 93 open gym issues. Roughly over half of these are installation issues alone. Compare this with brax or pybullet, which require a single pip install. The other half are nontrivial issues related to the physics and rendering (#1541, #2255, #1851). Coupled with lack of both maintenance and stability (mujoco-py is deprecated and may yanked from pypi), keeping the mujoco+mujoco-py environments in gym is simply unsustainable.

jkterry1 commented 3 years ago

For to readers reference, Rohan has been in most (but not all) of the private meetings.

@tkelestemur "It'll be a lot easier to change the MuJoCo wrapper than to switch to an entirely new simulator."

Given the amount of work required in the mujuco environments per the Gym bugs and per the discussion of specific environment issues we had to fix in the brax thread linked above and DeepMind's planned changes to MuJoCo that they expressed, I very strongly disagree with this. Additionally, while it was not explicitly stated in the meeting, the other plans the DeepMind MuJoCo team conveyed to me strongly leads me to believe that there will be many very large breaking changes to their existing MuJoco wrapper.

@tristandeleu "but was the Brax team given the option of providing their environments as a plugin at all?" Yes. Their given reason was that they want the Brax repo to just contain the physics library itself and not environments.

@araffin If nothing else, the Brax environments will be a /ton/ closer to the MuJoCo ones than to the PyBullet ones already are, like Rohan mentioned. Additionally, Erwin Coumans (the creator/maintainer of PyBullet for others reading) very specifically requested we don't depend PyBullet it for new work at the scale of Gym given current maintenance plans in our meetings.

erwincoumans commented 3 years ago

Just to chime in (as Bullet/PyBullet creator) I do keep maintaining PyBullet, including its PyBullet Gym environments, so it is not abandoned. I didn't want to deprecate the pybullet_envs Gym envs for Benelots alternative versions, so PyBullet became no-go (using pybullet_envs was never brought up as an option).

Tiny-Differentiable-Simulator also runs on both CPU and GPU and it has reduced coordinate simulator, for high quality simulations, but we didn't finish all gym envs yet (humanoid, half-cheetah, it likely takes a month or so to do those).

So Brax is a suggested option, with support for GPU and TPU accelerators, but a bit lower quality (due to simulation of joints as constraints, instead of reduced coordinates).

araffin commented 3 years ago

While this is true, the PyBullet environments, as well as reimplementations of the gym[mujoco] environments in PyBullet, also have significant differences in the observations from MuJoCo. f nothing else, the Brax environments will be a /ton/ closer to the MuJoCo ones than to the PyBullet ones already are, like Rohan mentioned.

@Rohan138 yes, I'm aware of that. That's also why I did tune and benchmark most common algorithms on it (see my comment avove): " For PyBullet, there is actually a benchmark of A2C, PPO, SAC and TD3 in that paper (with tuned hyperparameters): https://arxiv.org/abs/2005.05719 (table was moved to appendix in v2) and results can be easily reproduced in the rl zoo: https://github.com/DLR-RM/rl-baselines3-zoo

https://paperswithcode.com/paper/generalized-state-dependent-exploration-for " and TQC was benchmarked for SB3-Contrib: https://sb3-contrib.readthedocs.io/en/master/modules/tqc.html#results

benelot commented 3 years ago

I didn't want to deprecate the pybullet_envs Gym envs for Benelots alternative versions, so PyBullet became no-go (using pybullet_envs was never brought up as an option).

As far as I remember, the envs @erwincoumans keeps maintaining are the ones that I ported from roboschool. They are similar in the observations to mujoco, but not the same, but they work pretty well (as per Erwin´s experience they seem very good for different projects). The envs I still have in my pybullet-gym repository are the ones that are meant to be the same as mujoco in terms of observations. Unfortunately, they have missing observations as I could not identify all mujoco observations from the documentation and therefore could not find the correspondences in pybullet. That is why my envs are trainable as well, but some observations are just not available to the agent, with some of them potentially crucial to learning success.

Given that we no longer want to exactly have the same observations as mujoco, the pybullet_envs from the pybullet repository are the ones most tested to suit the needs of researchers (right after the original mujoco envs of course) while being close to the original mujoco envs.

vwxyzjn commented 3 years ago

UPDATE: here is a colab from @araffin : https://colab.research.google.com/drive/1KGMZdRq6AemfcNscKjgpRzXqfhUtCf-V?usp=sharing

To help with the transition and offer alternatives, I have made a temporary mujoco-py release named free-mujoco-py on PyPi using @nimrod-gileadi's PR (see https://github.com/openai/mujoco-py/pull/640) that leverages the new free MuJoCo binaries. Try executing

python -m venv venv
source venv/bin/activate
pip install gym
pip install free-mujoco-py
python -c "import mujoco_py;import gym;env=gym.make('HalfCheetah-v2')"

asciicast

The source code is at https://github.com/nimrod-gileadi/mujoco-py/pull/1. For convenience, the release automatically includes the MuJoCo binaries so there's no need to download it. At this point, it should only work with Linux, but could work out-of-the-box for Mac and maybe windows, I just don't have the machines to test it out.

IsaacSheidlower commented 3 years ago

Hello, thank you very much for this update. I have a question concerning Brax and the rendering of environments. Currently throughout all of open ai gym including the Box2D environments, to render the environment (include in rgb only mode), gym needs access to a display of some sort (see https://stackoverflow.com/questions/40195740/how-to-run-openai-gym-render-over-a-server for a discussion on this issue). The primary issue with this is rendering or getting the rgb observation is extremely difficult on a headless server and becomes impossible when wanting to simulate multiple environments on the same server. Are there plans to resolve this with the Brax implementation? I think there is general community agreement that one should be able to get a pixel based observation (i.e. not render the observation) without needing a display. Thank you very much.

erwincoumans commented 3 years ago
Are there plans to resolve this with the Brax implementation?

Yes, we just added this basic CPU pytinyrenderer to Brax. It is based on pytinyrenderer, which is also included with PyBullet. See the Brax Environments colab how it works, or see this colab for just pytinyrenderer. It can renderer boxes, sphere, capsules and textured meshes and has basic shadows.

IsaacSheidlower commented 2 years ago

Hello again. Thank you very much for Brax again! I don't know if this is a bug or if there is a way to change this, but currently the rendering of the "reacher" environment is extremely zoomed out and hard to see. For the application I am using brax for, I need the image render function to produce a human visable image of the reacher and goal. Is there a way to zoom on the reacher with the function call? I checked the source code but couldnt tell. Thank you very much.

ikostrikov commented 2 years ago

If anyone is interested, I replaced mujoco_py with dm_control here. It seems straightforward and requires replacing only 5ish lines of code in mujoco_env (+a lot of code for the viewer). Please see the corresponding commit.

jkterry1 commented 2 years ago

@ikostrikov these would require a version bump, correct?

ikostrikov commented 2 years ago

@jkterry1 do you mean -v2/-v3 => v4?

jkterry1 commented 2 years ago

@ikostrikov yes

jkterry1 commented 2 years ago

(e.g. that would change physics functionality, right?)

ikostrikov commented 2 years ago

Let me check this.

ikostrikov commented 2 years ago

@jkterry1 it doesn't change physics (it's just a different wrapper for the mujoco library). This latest commit in this repository and in my fork produce identical results.

You can find the tests I ran here.

lucaslingle commented 2 years ago

@jkterry, I'm curious if the situation has changed since you wrote Moreover, they all depend on MuJoCo-Py, which is now fully deprecated and cannot be reasonably maintained. in the initial post on Oct 21, 2021.

As a passer-by, I noticed that the mujoco-py github repo was updated on Nov 18, 2021, and that its status is "Maintainance (expect bug fixes and minor updates)". In particular, the maintainers of that repo seem to have updated it to support Mujoco version 2.1, and the readme says it is "maintained by the OpenAI robotics team."

In light of these efforts, I'm wondering how the characterization of mujoco-py as fully deprecated relates to these recent events. It is still accurate? If you feel the situation has changed, might gym support for Mujoco be maintained going forward, perhaps on a collaborative basis with the maintainers of mujoco-py?

You also wrote Given this, to use the environments with the more updated free versions of MuJoCo, to fix bugs and to be able to continue do basic maintenance like using new Python versions, the environments would have to be very nearly rewritten from scratch. I noticed that Mujoco actually offers free licenses now for the older versions as well, not just the open-sourced version. This is mentioned on the page https://roboti.us/license.html, which the mujoco website sends you to, if seeking to download an older version. I'm not sure if this changes your mind, but it suggests another possible path forward: you could just leave support for mujoco in place, and potentially add support for the newer versions at a later date.

It would be kind of disappointing to see Mujoco get off-roaded like this, simply because of the state of the gym codebase. Given OpenAI gym's widespread use in research, I am also curious to know what exactly are the bugs you were referring to.

Thanks, and hope to hear from you!

jkterry1 commented 2 years ago

@lucaslingle

ikostrikov commented 2 years ago

@jkterry1 why replacing mujoco_py with dm_control, which is just an alternative binding for the mujoco library from DeepMind, does not solve the problem? It guarantees reproducibility, and DeepMind maintains it.

lucaslingle commented 2 years ago

@jkterry1 Ok, thanks for the reply; very surprising news to me. I look forward to seeing Brax integration!

jkterry1 commented 2 years ago

@ikostrikov

danbri commented 2 years ago

The Box2D environments will be rewritten in Brax in a new Phys2D environment class, and the Box2D environments will be deprecated and then removed

is this still happening? I didn’t find much searching on [ BipedalWalker brax ] .

jkterry1 commented 2 years ago

@danbri yes, probably in a few more months

jkterry1 commented 2 years ago

Hey, I'm going to lock this thread as the initial discussion is resolved and most of the comments should be created/discussed elsewhere