vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
4.91k stars 566 forks source link

Add gymnasium support to SAC #378

Closed pseudo-rnd-thoughts closed 9 months ago

pseudo-rnd-thoughts commented 1 year ago

Description

Types of changes

Checklist:

If you need to run benchmark experiments for a performance-impacting changes:

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 13, 2023 1:42pm
timoklein commented 1 year ago

Hey there, did you also run into this issue while testing sac_continuous_action.py?

qiuruiyu commented 9 months ago

Hello! Are you still working on transmitting SAC from gym to gymnasium? And any problem did you met?

pseudo-rnd-thoughts commented 9 months ago

Hey, I think the only thing that I never completed was finishing the training, I had another project that required the compute time more so had to cancel it and never got back to this sadly. Otherwise, I believe, the code is updated but I might be wrong

sdpkjc commented 9 months ago

Because the MacOS MuJoCo test does not have Mesa installed, I have commented out that part of the test for now.

sdpkjc commented 9 months ago

After this PR is merged, all of our algorithms for the MuJoCo environments will have been migrated. Therefore, I've organized the test files and CI files in this PR.

vwxyzjn commented 9 months ago

After this PR is merged, all of our algorithms for the MuJoCo environments will have been migrated. Therefore, I've organized the test files and CI files in this PR.

Sounds great! Thanks @sdpkjc

sdpkjc commented 9 months ago

pr-378 pr-378-time

sdpkjc commented 9 months ago

pr-378-atari pr-378-time

Since sac_atari did not have the experiment with rlops-pilot tag for comparison, I used the previous experiment with pr-270 tag as a comparison.