jidiai / GRF_MARL

Google Research Football MARL Benchmark and Research Toolkit
https://grf-marl.readthedocs.io/
Other
27 stars 4 forks source link

PSRO: can BR policy be inherited from previous one? #2

Closed zkengz closed 4 months ago

zkengz commented 4 months ago

Many thanks to your hard work! And I have a question.

You mention in competitive.rst that:

In practice, since each BR poliy is hard to learn from scratch (random initialization), we inherit from previous BR policy to largely speed up the process.

In contradiction, in Xu Z, Liang Y, Yu C, et al. Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games, the author claims that:

PSRO requires finding a joint best response in each iteration, in order to promote exploration and avoid being trapped in a local sub-optimum, the BR policy needs to be trained from scratch in every iteration.

I wonder whether you have experimented both way and come to that conclusion. I appreciate your assistance and look forward to your reply.

YanSong97 commented 4 months ago

Hi zkengz,

We agree that inheriting previous policies can limit the exploration. However, on large-scale tasks such as multi-agent GRF, learning from scratch is extremely hard and inefficient, as the team needs to learn basic ball-passing skills before it can come up with an effective team strategy. So in our case, we resort to the inherited version.

Indeed, we find that PSRO with inherits does lack exploration and we also find that the initial policy style-of-play plays a significant role in each PSRO trial. So we run multiple PSRO trials with various initial policies and quickly build up a pool of strategies that can be used in later phases (say League Training).

Feel free to have a look at our paper. In Section 6 we detailed our PBT attempts.

zkengz commented 4 months ago

Thanks for your detailed response! It really clears up my confusions.