Update to support Gymnasium

arjun-kg commented 1 year ago

Description

A draft PR updating CleanRL to support Gymnasium. Closes #263

This mostly includes updating step and seed API. Tries to use gymnasium branches on the dependent packages (SB3 etc) After these are updated, will verify the changes, check the tests, and get the PR ready for review.

Costa's comment:

Thanks @arjun-kg for the PR. We look forward to supporting the next generation of gym.

It's important to identify the performance-impacting changes and non-performance-impacting changes:

non-performance-impacting changes is fine and we can just merge them straightaway
performance-impacting changes is more complicated — we would need to run benchmark experiments to ensure there is no performance regression. We need to do this because even bug fixes could make the performance worse (see https://github.com/openai/gym/pull/2762#issuecomment-1135362092 as an example).

In this PR for initial support fo v0.26.1, let's aim to make only non-performance-impacting changes. With that said, here is a todo list:

[x] Deprecate pybullet (since the new mujoco environments are being maintained again but pybullet is not)
[ ] Temporarily remove Isaac Gym Support
[ ] re-run mojoco experiments (use the v4 environments instead of the current v2 environments)
[ ] Atari wrappers upstream fixes (https://github.com/DLR-RM/stable-baselines3/pull/780)
[ ] procgen

Checklist:

[x] I've read the CONTRIBUTION guide (required).
[x] I have ensured pre-commit run --all-files passes (required).
[ ] I have updated the documentation and previewed the changes via mkdocs serve.
[ ] I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

[ ] I have contacted vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
[ ] I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
[ ] I have added additional documentation and previewed the changes via mkdocs serve.
- [ ] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [ ] I have added links to the original paper and related papers (if applicable).
- [ ] I have added links to the PR related to the algorithm.
- [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves (in PNG format with width=500 and height=300).
- [ ] I have added links to the tracked experiments.
- [ ] I have updated the overview sections at the docs and the repo
[ ] I have updated the tests accordingly (if applicable).

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add your feedback	Mar 27, 2023 at 6:37AM (UTC)

vwxyzjn commented 1 year ago

Related to #271

vwxyzjn commented 1 year ago

Thanks @arjun-kg for the PR. We look forward to supporting the next generation of gym.

It's important to identify the performance-impacting changes and non-performance-impacting changes:

non-performance-impacting changes is fine and we can just merge them straightaway
performance-impacting changes is more complicated — we would need to run benchmark experiments to ensure there is no performance regression. We need to do this because even bug fixes could make the performance worse (see https://github.com/openai/gym/pull/2762#issuecomment-1135362092 as an example).

In this PR for initial support for v0.26.1, let's aim to make only non-performance-impacting changes. With that said, I have added a todo list in the PR description.

vwxyzjn commented 1 year ago

@arjun-kg I made the first pass of editing to make ppo.py and dqn.py to pass CI. Could you try looking at ddpg_continuous_action.py, TD3, and DDPG?

Btw the plan is to have an announcement like the following on the main page, since I expect to encounter more issues.

GaetanLepage commented 1 year ago

Hi ! When reading through the proposed changes, I am not sure to understand the following: Why do you replace done with terminated and not terminated or truncated ? I am not sure to get why the truncated return value is ignored.

vwxyzjn commented 1 year ago

@GaetanLepage yeah, we should do terminated or truncated for the moment until we properly deal with #198.

@arjun-kg I added some changes to ppo_continuous_action.py to make it work with DM control power by https://github.com/Farama-Foundation/Shimmy/blob/main/tests/test_dm_control.py

vwxyzjn commented 1 year ago

@arjun-kg we are thinking of probably supporting both gymnasium and gym simultaneously. See https://github.com/vwxyzjn/cleanrl/pull/318#issuecomment-1314666112 as an example. This will give us a much smoother transition

arjun-kg commented 1 year ago

@vwxyzjn sounds good, will check it out. I'm a bit tied up this week. I'll continue work on this from next week if it's okay.

pseudo-rnd-thoughts commented 1 year ago

We have just released Gymnasium v0.27.0, this should be backward compatible. Would it be possible to update this Pr to v0.27 and check that nothing new breaks

arjun-kg commented 1 year ago

@vwxyzjn recently SB3 supports gymnasium with a branch, but I'm not sure if some parallel work is going on to update cleanrl to gymnasium? Would you like me to update this PR to gymnasium with SB3 on the gymnasium branch?

kvrban commented 1 year ago

Just tried the PR with:

diff --git a/cleanrl/dqn.py b/cleanrl/dqn.py
 import time
 from distutils.util import strtobool

-import gym
+import gymnasium as gym
 import numpy as np
 import torch
 import torch.nn as nn

with stable-baselines3==2.0.0a5 and gymnasium==0.28.1

when i run

python3 cleanrl/cleanrl/dqn.py

always after 'global_step=10009' execution stop with this error:

Traceback (most recent call last):
  File "/home/kris/dev/cleanRL-Gymnasium/cleanrl/cleanrl/dqn.py", line 195, in <module>
    data = rb.sample(args.batch_size)
  File "/home/kris/.local/lib/python3.9/site-packages/stable_baselines3/common/buffers.py", line 285, in sample
    return super().sample(batch_size=batch_size, env=env)
  File "/home/kris/.local/lib/python3.9/site-packages/stable_baselines3/common/buffers.py", line 110, in sample
    batch_inds = np.random.randint(0, upper_bound, size=batch_size)
  File "mtrand.pyx", line 765, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0

kvrban commented 1 year ago

i think it was not intended to remove the following line:

rb.add(obs, real_next_obs, actions, rewards, terminateds, infos)

This could be the fix (fixed it for me):

diff --git a/cleanrl/dqn.py b/cleanrl/dqn.py
index 14864e7..4e73a6e 100644
--- a/cleanrl/dqn.py
+++ b/cleanrl/dqn.py
@@ -156,7 +156,7 @@ if __name__ == "__main__":
     start_time = time.time()

     # TRY NOT TO MODIFY: start the game
-    obs = envs.reset(seed=args.seed)
+    obs, _ = envs.reset(seed=args.seed)
     for global_step in range(args.total_timesteps):
         # ALGO LOGIC: put action logic here
         epsilon = linear_schedule(args.start_e, args.end_e, args.exploration_fraction * args.total_timesteps, global_step)
@@ -185,6 +185,7 @@ if __name__ == "__main__":
             for idx, d in enumerate(infos["_final_observation"]):
                 if d:
                     real_next_obs[idx] = infos["final_observation"][idx]
+        rb.add(obs, real_next_obs, actions, rewards, terminateds, infos)

         # TRY NOT TO MODIFY: CRUCIAL step easy to overlook
         obs = next_obs

pseudo-rnd-thoughts commented 1 year ago

@kvrban Thanks for the comment but I think the plan is to complete this PR as several smaller PRs, see #370 and #371.

@arjun-kg or @vwxyzjn Should this PR be closed to avoid confusion?

vwxyzjn commented 1 year ago

@pseudo-rnd-thoughts absolutely. Closing this PR now.

vwxyzjn / cleanrl