vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.26k stars 602 forks source link

Update to support Gymnasium #277

Closed arjun-kg closed 1 year ago

arjun-kg commented 1 year ago

Description

A draft PR updating CleanRL to support Gymnasium. Closes #263

This mostly includes updating step and seed API. Tries to use gymnasium branches on the dependent packages (SB3 etc) After these are updated, will verify the changes, check the tests, and get the PR ready for review.

Costa's comment:

Thanks @arjun-kg for the PR. We look forward to supporting the next generation of gym.

It's important to identify the performance-impacting changes and non-performance-impacting changes:

In this PR for initial support fo v0.26.1, let's aim to make only non-performance-impacting changes. With that said, here is a todo list:

Checklist:

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Mar 27, 2023 at 6:37AM (UTC)
vwxyzjn commented 1 year ago

Related to #271

vwxyzjn commented 1 year ago

Thanks @arjun-kg for the PR. We look forward to supporting the next generation of gym.

It's important to identify the performance-impacting changes and non-performance-impacting changes:

In this PR for initial support for v0.26.1, let's aim to make only non-performance-impacting changes. With that said, I have added a todo list in the PR description.

vwxyzjn commented 1 year ago

@arjun-kg I made the first pass of editing to make ppo.py and dqn.py to pass CI. Could you try looking at ddpg_continuous_action.py, TD3, and DDPG?

Btw the plan is to have an announcement like the following on the main page, since I expect to encounter more issues.

image
GaetanLepage commented 1 year ago

Hi ! When reading through the proposed changes, I am not sure to understand the following: Why do you replace done with terminated and not terminated or truncated ? I am not sure to get why the truncated return value is ignored.

vwxyzjn commented 1 year ago

@GaetanLepage yeah, we should do terminated or truncated for the moment until we properly deal with #198.

@arjun-kg I added some changes to ppo_continuous_action.py to make it work with DM control power by https://github.com/Farama-Foundation/Shimmy/blob/main/tests/test_dm_control.py

vwxyzjn commented 1 year ago

@arjun-kg we are thinking of probably supporting both gymnasium and gym simultaneously. See https://github.com/vwxyzjn/cleanrl/pull/318#issuecomment-1314666112 as an example. This will give us a much smoother transition

arjun-kg commented 1 year ago

@vwxyzjn sounds good, will check it out. I'm a bit tied up this week. I'll continue work on this from next week if it's okay.

pseudo-rnd-thoughts commented 1 year ago

We have just released Gymnasium v0.27.0, this should be backward compatible. Would it be possible to update this Pr to v0.27 and check that nothing new breaks

arjun-kg commented 1 year ago

@vwxyzjn recently SB3 supports gymnasium with a branch, but I'm not sure if some parallel work is going on to update cleanrl to gymnasium? Would you like me to update this PR to gymnasium with SB3 on the gymnasium branch?

kvrban commented 1 year ago

Just tried the PR with:

diff --git a/cleanrl/dqn.py b/cleanrl/dqn.py
 import time
 from distutils.util import strtobool

-import gym
+import gymnasium as gym
 import numpy as np
 import torch
 import torch.nn as nn

with stable-baselines3==2.0.0a5 and gymnasium==0.28.1

when i run

python3 cleanrl/cleanrl/dqn.py

always after 'global_step=10009' execution stop with this error:

Traceback (most recent call last):
  File "/home/kris/dev/cleanRL-Gymnasium/cleanrl/cleanrl/dqn.py", line 195, in <module>
    data = rb.sample(args.batch_size)
  File "/home/kris/.local/lib/python3.9/site-packages/stable_baselines3/common/buffers.py", line 285, in sample
    return super().sample(batch_size=batch_size, env=env)
  File "/home/kris/.local/lib/python3.9/site-packages/stable_baselines3/common/buffers.py", line 110, in sample
    batch_inds = np.random.randint(0, upper_bound, size=batch_size)
  File "mtrand.pyx", line 765, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0
kvrban commented 1 year ago

i think it was not intended to remove the following line:

rb.add(obs, real_next_obs, actions, rewards, terminateds, infos)

This could be the fix (fixed it for me):

diff --git a/cleanrl/dqn.py b/cleanrl/dqn.py
index 14864e7..4e73a6e 100644
--- a/cleanrl/dqn.py
+++ b/cleanrl/dqn.py
@@ -156,7 +156,7 @@ if __name__ == "__main__":
     start_time = time.time()

     # TRY NOT TO MODIFY: start the game
-    obs = envs.reset(seed=args.seed)
+    obs, _ = envs.reset(seed=args.seed)
     for global_step in range(args.total_timesteps):
         # ALGO LOGIC: put action logic here
         epsilon = linear_schedule(args.start_e, args.end_e, args.exploration_fraction * args.total_timesteps, global_step)
@@ -185,6 +185,7 @@ if __name__ == "__main__":
             for idx, d in enumerate(infos["_final_observation"]):
                 if d:
                     real_next_obs[idx] = infos["final_observation"][idx]
+        rb.add(obs, real_next_obs, actions, rewards, terminateds, infos)

         # TRY NOT TO MODIFY: CRUCIAL step easy to overlook
         obs = next_obs
pseudo-rnd-thoughts commented 1 year ago

@kvrban Thanks for the comment but I think the plan is to complete this PR as several smaller PRs, see #370 and #371.

@arjun-kg or @vwxyzjn Should this PR be closed to avoid confusion?

vwxyzjn commented 1 year ago

@pseudo-rnd-thoughts absolutely. Closing this PR now.