Updated Roadmap for Gym 1.0

jkterry1 commented 2 years ago

This is a very loose roadmap for what/when major breaking changes should be expected in Gym and in what order (last updated September 6, 2022):

October:

[ ] Wrapper overhaul
[ ] Official Conda packaging

December:

[ ] Vector API overhaul
[ ] Native make_vec method
[ ] Extensive envpool integration
[ ] Functional API

February:

[ ] Brax based Phys3D environments to replace MuJoCo environments that people can test out
[ ] Brax based Phys2D environments to replace Box2D environments that people can test out
[ ] Hardware acceleration for classic control and toy_text envs

March:

[ ] Brax environments bug fixes as needed

April:

[ ] Brax environments become permanent, box2d and mujoco environments moved to separate repo like robotics environments were

Future:

[ ] Full type hints
[ ] Alternate language support

araffin commented 2 years ago

Hello, with all those breaking changes, it would maybe make sense to update the package name too? (as we did with SB3) That would avoid many bad surprises when upgrading the package.

RedTachyon commented 2 years ago

So overall my opinion is that before 1.0, we should keep things as backwards-compatible as possible, then make a complete, sane design for everything mentioned here, and do breaking changes at the 1.0 release. Maybe providing some wrappers that will expose something similar to the old API if someone really needs it, but it shouldn't be encouraged.

I don't think changing to a different repo/package name is necessary (even though I might be in favor of that for unrelated reasons), since we'd be following regular semantic versioning. When the "big" version number is changed, breaking changes are to be expected.

As for some of the specifics:

After robotics envs are removed, is the intention to replace (some/all) of them with Brax envs? Or should it rather be just a few "proof of concept" brax envs?
New render API isn't finalized yet, right? I feel like it might get messy, we should definitely have something that we're sure "makes sense" before 1.0
I'm not that sure about redoing classic control and toy text in jax. As much as I like it, its popularity is like two orders of magnitude below numpy, so we're adding an extra dependency. There is some advantage in having simple environments written basically in pure python (because with any kind of numerics, numpy is basically in the python standard library). I'm not sure if the performance difference will be worth it, since a single step of these envs is going to be super fast anyways. It'd be different with a vectorized environment, but that's a whole other discussion
And yea, vector API. I honestly wouldn't mind just dropping it from gym and letting downstream users do vectorization however they want. Other than that, I'm a fan of dict-based vecenvs, but that only really becomes necessary for multiagent stuff, which isn't really supported by gym. There doesn't seem to be much code on github that relies on gym.vector.make, I'm also not aware of any large scale applications that use it.
For imports, I think I'd like something similar to PZ. It's difficult, because gym.make(...) feels like an antipattern, but it's also pretty convenient. Maybe keeping both would be good, but exposing the object-oriented API a bit more in the docs? (e.g. from gym.envs.classic_control import CartPoleEnv, which already can be done)

tristandeleu commented 2 years ago

And yea, vector API. I honestly wouldn't mind just dropping it from gym and letting downstream users do vectorization however they want. Other than that, I'm a fan of dict-based vecenvs, but that only really becomes necessary for multiagent stuff, which isn't really supported by gym. There doesn't seem to be much code on github that relies on gym.vector.make, I'm also not aware of any large scale applications that use it.

There is an ongoing discussion in #2279 regarding changes to the Vectorized Environment API, I invite you to read through this issue and contribute there.

About the adoption of vectorized environment, the main problem is the absence of official documentation for vectorized environments; this is hardly an issue with the API itself. A fairly exhaustive documentation about vectorized environments was added in #2327, but as mentioned here the effort on documentation has been particularly slow, so this has not been properly integrated yet. The goal of VectorEnv inside gym is to offer an out of the box solution for easy vectorization, to avoid the common pattern of copy/pasting SubprocVecEnv (see #1513) when starting a new project. Of course, it's not meant to handle large scale applications, and other specialized solutions exist for that (e.g. launchpad, Ray/Rllib).

jkterry1 commented 2 years ago

"After robotics envs are removed, is the intention to replace (some/all) of them with Brax envs" It'd require a huge amount of work for the Brax team and AFAIK they don't want to do it, at least not right now. They also aren't huge universal benchmarks the way MuJoCo is, they're exclusively used for HER type work
The new render API will be brought in before 1.0, yes. I'll create a post with the options for this when I'm able to

ZhiqingXiao commented 2 years ago

On "Have reset always return info":

I agree that the current return of reset(), which only consists of observation, does not suffice.

I have noticed that the proposal to add info to the return of reset(). I suggest add done and reward, too. For example, an episode can start with a negative reward, such as a random initial negative cash-flow for setup something. An episode can also be terminal immediately after it is initialized. (For example, the initial state is bad and the game is over immediately after it starts.)

Another possible alternative is to provide a member function last() or observe() to return observation, reward, and done, just like the APIs of openai/gym3 . This approach can provide reward, done, info after reset(), without breaking existing codes that use the return of reset(). Additionally, the current API requires these information to be saved immediately after the action is applied, which sometimes introduces inconvenience. With the member function last(), users can get these information when they are indeed needed. The implementation is not too difficult yet: Just save these information before the return of reset() and step() and return them when last() is called.

tgolsson commented 2 years ago

Hello!

I'm wondering about the state/progress for Box2D. I'm working on multiple projects currently on the 3.8 -> 3.9 transition path, and we're consistently running into issues with OpenAI's fork of Box2D. For 3.8 we could often use the pybox2d variant while disabling the box2d feature in Gym. However with 3.9 that door also closes, leaving compilation the only install option.

As a maintainer of other packages this is highly undesirable as a burden on end-users, especially for Windows. While I know that Windows might not be a primary target for Gym, not everyone has a choice to use Linux. While longterm solutions like replacing Box2D wholly might be desirable, I'm curious whether OpenAI could update its fork with wheels for 3.8, 3.9, like was done for atari-py? This'd solve the immediate problem, with hopefully not too much work.

Furthermore, I'm curious about roadmaps for replacing Box2D -- if there's specific steps being taken, opportunities for contribution, etc? While time is always of shortage, I'd much rather sink a few days into an actionable target for gym that'll solve this permanently, than push the issue in front of me and solve it time and time again.

jkterry1 commented 2 years ago

I don't believe that developments on box2d are likely at this point, admittedly due to factors beyond my control. However, one one of the threads here that discuss box2d (I forget which, I apologize), someone linked to new maintained python bindings for box2d that they were working on.

foxik commented 1 year ago

Just to let you know, I teach a reinforcement learning course, where I use both the LunarLander and the CarRacing environment. For that course, I compiled binary wheels for Box2D -- they are available as the ufal.pybox2d Pypi package. We have been using them for the past year (with gym 0.20.0) withou an issue, I just rebuild them a few days ago, resulting in CPython 3.6-3.11 wheels for Linux 32/64bit, Windows 32/64bit, macOS 64bit/ARM64. They seem to work with gym 0.26.0 fine (but the students will test them on theyr computers around November).

I am not volunteering to add functionality to Box2D itself, but I plan to keep the wheels around for some time. If you wanted, I could provide some "better" name than ufal.pybox2d (ÚFAL is the name of our department).

pseudo-rnd-thoughts commented 1 year ago

Hi, thanks for the suggestion. I know there are issues with the box2d-py on windows and I think the general suggestion was to use box2d so maybe changing to your package would be better.

In the longer time (in the next 6 months), our plan was to move the box2d environments into a legacy repo (and still installable) and replace them with jax based versions of the environments which should be 100 to 1000x faster, possibly more with CarRacing (it is really really slow currently).

foxik commented 1 year ago

Thanks for sharing the plans; the ufal.pybox2d is actually just a (virtually unchanged) fork of the box2d Pypi package, but providing newer Python wheels (box2d offers only Python 3.5-3.8 packages, and no ARM64).

Moving the environments to a legacy repo seems like a reasonable move.

(Personally I am not convinced about the speedup -- I have been using Brax and it seems the large speedup can be achieved when the environment steps and training can be "joined" together to run on GPU; if you go "back and forth" between "small" environment steps in Brax and "independent" agent training, the improvements are smaller.)

(With CarRacing, I believe the problem is actually the speed of rendering (even for state_pixels mode, you currently render to 600x400 and only at the end downscale to 96x96, and furthermore, two heuristics for avoiding rendering have been removed some time ago from the current version of CarRacing); for the past 3 years, we have been using a fork that renders directly to 96x96 and keeps the heuristics, the speed difference is noticable -- CarRacing from master gives ~100 steps per second on my notebook, our version (which uses hopefully unchanged dynamic from the Box2D point of view) gives ~350 steps per second, and completely disabling rendering gives ~6.5k steps per second -- so the Brax part will only accelerate that part of computation, and will not be noticeable by the users of the environment, because rendering is needed to get the current observation.)

(But I am not against the move to Brax -- it is a well-maintained engine, compared to Box2D/Bullet, which are not, so I am definitely in favor of it; it just will not make existing programs run dramatically faster.)

pseudo-rnd-thoughts commented 1 year ago

Thanks for the information, that makes a lot of sense why car racing is so slow. I wasn't involved when the heuristics you talk about were removed, I wonder why they were removed.

You are right about the brax only being a bit faster if particular optimisations are not used, i.e. parallelisation, gpu, etc I should have noted that we plan to using a jax based rendering system as well so the slow down by pygame is limited.

Do you know which PRs removed the optimisations you note? Was there a reason? Would you want to make a PR with your fork so we can see the environments?

foxik commented 1 year ago

Whow, bringing in jax-based rendering could provide a large speedup -- there is at least a potential for something like 50-times faster CarRacing :-)

Regarding CarRacing, I will be updating our version in a few weeks (to give it to students) -- so I will polish it and will create an issue here describing the changes, and we will see what people about it. It will not be pixel-perfect, because one of the optimizations is not to render to 1000x800 surface (sorry for the incorrect numbers 600x400 above) if you only require 96x96 state, but directly to 96x96; that will however never be exactly the same -- so maybe it would require increasing the CarRacing version. I made a quick measurement and the current pygame version with our rendering changes incorporated achieves ~700 steps per second, so 7-times speedup to the current version [the reason why I reported only 3.5 above is that in the version discussed above we use Python-only software rendering to numpy arrays -- because we wanted it to run on headless servers without OpenGL libraries, and the previous code required GL for rendering] -- so the potential is there.

jkterry1 commented 1 year ago

Hey, we just launched gymnasium, a fork of Gym by the maintainers of Gym for the past 18 months where all maintenance and improvements will happen moving forward, including this 1.0 roadmap.

We have an announcement post here- https://farama.org/Announcing-The-Farama-Foundation

openai / gym

Updated Roadmap for Gym 1.0 #2524