sjtu-marl / malib

A parallel framework for population-based multi-agent reinforcement learning.
https://malib.io
MIT License
498 stars 60 forks source link

Fix PSRO examples for Leduc Hold'em #38

Closed XuehaiPan closed 2 years ago

XuehaiPan commented 2 years ago

Hi, thanks for creating such an amazing framework for MARL.

I'm trying to run the example code shown in README.md and/or https://malib.io. I found some bugs that trouble beginners with this repo. Such as:

https://github.com/sjtu-marl/malib/blob/53d64ba47d47b57cc8706588a788c5ccadf93619/malib/algorithm/mappo/policy.py#L20

In VectorEnv.reset:

https://github.com/sjtu-marl/malib/blob/5be07ac00761a34fb095adb2b3018a798ceea256/malib/envs/vector_env.py#L191-L193

env.reset() is called with arguments (max_step and custom_reset_config). However, PockerEnv.reset and wrappers provided by pettingzoo do not accept any arguments.

https://github.com/sjtu-marl/malib/blob/5be07ac00761a34fb095adb2b3018a798ceea256/malib/envs/poker/poker_aec_env.py#L131-L135

https://github.com/sjtu-marl/malib/blob/5be07ac00761a34fb095adb2b3018a798ceea256/malib/envs/poker/poker_aec_env.py#L168-L178

https://github.com/Farama-Foundation/PettingZoo/blob/b839259e961798cfc23b6f82c6ba0898b55cda60/pettingzoo/utils/wrappers/base.py#L77-L85

KornbergFresnel commented 2 years ago

@XuehaiPan Thanks for your contribution. As we've reconstructed the pipeline of PSRO training, this PR is out-of-date. You can check the new changes in master and test-cases branches for functionality validation.