Update ppo_pettingzoo_ma_atari.py

elliottower commented 1 year ago

This PR updates the pettingzoo multiagent atari example to use gymnasium rather than gym and to use the current pettingzoo API (with termination and truncation, following gymnasium/gym v26). I've had some people ask about more in-depth CleanRL resources for PettingZoo, so I figure updating this would be a good start.

Unfortunately it seems like the record episode statistics won't work because it expects a dict for info, and the supersuit's concat_vec_env makes the info into a list of dicts. Could write a custom wrapper to do that but it's not entirely clear to me that's the best way to do things. We have been looking into mirroring the gymnasium vector API into PettingZoo which would allow the recorder class and most other gymnasium functionality to work, as far as I can tell, but that will take some time.

Description

Types of changes

[ ] Bug fix
[ ] New feature
[ ] New algorithm
[ ] Documentation

Checklist:

[ ] I've read the CONTRIBUTION guide (required).
[x] I have ensured pre-commit run --all-files passes (required).
[ ] I have updated the tests accordingly (if applicable).
[ ] I have updated the documentation and previewed the changes via mkdocs serve.
- [ ] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [ ] I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

[ ] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
[ ] I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture-video.
[ ] I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - [ ] I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- [ ] I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jan 18, 2024 7:27pm

elliottower commented 12 months ago

Not sure how big of a problem it is not being able to record episode statistics, it may be better to keep things as they are currently so it doesn't lose functionality, and I can put this updated version onto the pettingzoo docs.

There may be a way to get the episode statistics to work but it seems difficult as I explained above

elliottower commented 12 months ago

@vwxyzjn I'm imagining this would also require running the benchmarks? Let me know if you have any thoughts on this, I've also been considering doing an action masking example for other PZ envs, would that be something you're interested in having here?

ezhang7423 commented 5 months ago

Are there any updates on this?

elliottower commented 5 months ago

Looks like https://github.com/vwxyzjn/cleanrl/pull/424 was merged,

Are there any updates on this?

Looks like https://github.com/vwxyzjn/cleanrl/pull/424 was merged, but it didn't update the PettingZoo example besides a minor CLI arguments change. Re-reading Costa's messages I think I misinterpreted him that he was intending to update this himself (or maybe he didn't have time).

Anyways, I have some time today and will try to resolve these conflicts and integrate @KaleabTessera's suggestions, so we can at least have an updated version of this script. Won't have time for benchmarking in the near future but could eventually get to it.

vwxyzjn commented 5 months ago

Yeah sorry @elliottower things have gotten busy. Feel free to submit a PR. As long as you can reproduce the existing benchmark experiments we can merge :)

elliottower commented 5 months ago

No worries, sounds good. I'll just use this same PR for simplicity's sake.

elliottower commented 5 months ago

Btw just FYI there's a bunch of already merged branches in this repo which could probably deleted (am pulling the most recent master branch and see a huge list)

vwxyzjn / cleanrl