Add gymnasium support to SAC

pseudo-rnd-thoughts commented 1 year ago

Description

Types of changes

[ ] Bug fix
[x] New feature
[ ] New algorithm
[ ] Documentation

Checklist:

[x] I've read the CONTRIBUTION guide (required).
[x] I have ensured pre-commit run --all-files passes (required).
[x] I have updated the tests accordingly (if applicable).
[ ] I have updated the documentation and previewed the changes via mkdocs serve.
- [ ] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [ ] I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

[ ] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
[ ] I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture-video.
[ ] I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - [ ] I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- [ ] I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Oct 13, 2023 1:42pm

timoklein commented 1 year ago

Hey there, did you also run into this issue while testing sac_continuous_action.py?

qiuruiyu commented 9 months ago

Hello! Are you still working on transmitting SAC from gym to gymnasium? And any problem did you met?

pseudo-rnd-thoughts commented 9 months ago

Hey, I think the only thing that I never completed was finishing the training, I had another project that required the compute time more so had to cancel it and never got back to this sadly. Otherwise, I believe, the code is updated but I might be wrong

sdpkjc commented 9 months ago

Because the MacOS MuJoCo test does not have Mesa installed, I have commented out that part of the test for now.

sdpkjc commented 9 months ago

After this PR is merged, all of our algorithms for the MuJoCo environments will have been migrated. Therefore, I've organized the test files and CI files in this PR.

vwxyzjn commented 9 months ago

After this PR is merged, all of our algorithms for the MuJoCo environments will have been migrated. Therefore, I've organized the test files and CI files in this PR.

Sounds great! Thanks @sdpkjc

sdpkjc commented 9 months ago

pr-378 pr-378-time

sdpkjc commented 9 months ago

pr-378-atari pr-378-time

Since sac_atari did not have the experiment with rlops-pilot tag for comparison, I used the previous experiment with pr-270 tag as a comparison.

vwxyzjn / cleanrl