Support Pettingzoo Multi-agent Atari envs with PPO

vwxyzjn commented 2 years ago

Description

Follow up to #144.

Types of changes

[x] New feature

Checklist:

[x] I've read the CONTRIBUTION guide (required).
[x] I have ensured pre-commit run --all-files passes (required).
[x] I have updated the documentation and previewed the changes via mkdocs serve.
[x] I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

[x] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
[x] I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
[x] I have added additional documentation and previewed the changes via mkdocs serve.
- [x] I have explained note-worthy implementation details.
- [x] I have explained the logged metrics.
- [x] I have added links to the original paper and related papers (if applicable).
- [x] I have added links to the PR related to the algorithm.
- [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves (in PNG format with width=500 and height=300).
- [ ] I have added links to the tracked experiments.
[ ] I have updated the tests accordingly (if applicable).

vercel[bot] commented 2 years ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Jun 1, 2022 at 10:10PM (UTC)

gitpod-io[bot] commented 2 years ago

vwxyzjn commented 2 years ago

@benblack769 @araffin @Miffyli @jkterry1 @kcorder would you mind helping review this PR? In particular, could you help review the following:

This filedifff (https://www.diffchecker.com/WQ3yzb1Y) highlights the lines of code changes.
The documentation (https://cleanrl-git-pettingzoo-docs-vwxyzjn.vercel.app/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy) specifies implementation details and results.

Thanks!

kcorder commented 2 years ago

This all looks good to me!

Just some things I think we should try out:

we have a NoopReset wrapper for PZ envs
Jordan/Ben previously found using the InvertColor agent indicator was better than normal agent indicator

vwxyzjn commented 2 years ago

Thank you @kcorder, I’d be happy to try out the no-op reset wrapper. Is the InvertColor agent indicator in supersuit? Also see https://wandb.ai/costa-huang/cleanRL/reports/MA-ALE--VmlldzoxNzAzMDQx#invert-color-indicator which shows the performance of the invertcolor indicator - at least in pong it does not perform as well as the naive indicator.

kcorder commented 2 years ago

Oh interesting, good to know about agent indicator - I hadn't tried myself.

The NoopReset is here: https://github.com/jkterry1/MA-ALE2/blob/74f562d088c795e7fa4fdeba494f2573ac9c6c7e/env_utils.py#L324-L345

We've been using this InvertColorAgentIndicator - there was a bug fix there since the original code actually

vwxyzjn commented 2 years ago

@kcorder thanks for the helpful pointers. While it would be interesting to try this preprocessing, I would like to defer this as future work since we are aiming for a 1.0.0 release soon.

vwxyzjn / cleanrl