vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.54k stars 631 forks source link

Support Pettingzoo Multi-agent Atari envs with PPO #188

Closed vwxyzjn closed 2 years ago

vwxyzjn commented 2 years ago

Description

Follow up to #144.

Types of changes

Checklist:

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

vercel[bot] commented 2 years ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Jun 1, 2022 at 10:10PM (UTC)
gitpod-io[bot] commented 2 years ago

vwxyzjn commented 2 years ago

@benblack769 @araffin @Miffyli @jkterry1 @kcorder would you mind helping review this PR? In particular, could you help review the following:

Thanks!

kcorder commented 2 years ago

This all looks good to me!

Just some things I think we should try out:

vwxyzjn commented 2 years ago

Thank you @kcorder, I’d be happy to try out the no-op reset wrapper. Is the InvertColor agent indicator in supersuit? Also see https://wandb.ai/costa-huang/cleanRL/reports/MA-ALE--VmlldzoxNzAzMDQx#invert-color-indicator which shows the performance of the invertcolor indicator - at least in pong it does not perform as well as the naive indicator.

kcorder commented 2 years ago

Oh interesting, good to know about agent indicator - I hadn't tried myself.

The NoopReset is here: https://github.com/jkterry1/MA-ALE2/blob/74f562d088c795e7fa4fdeba494f2573ac9c6c7e/env_utils.py#L324-L345

We've been using this InvertColorAgentIndicator - there was a bug fix there since the original code actually

vwxyzjn commented 2 years ago

@kcorder thanks for the helpful pointers. While it would be interesting to try this preprocessing, I would like to defer this as future work since we are aiming for a 1.0.0 release soon.