Closed zkengz closed 4 months ago
Hi zkengz,
We agree that inheriting previous policies can limit the exploration. However, on large-scale tasks such as multi-agent GRF, learning from scratch is extremely hard and inefficient, as the team needs to learn basic ball-passing skills before it can come up with an effective team strategy. So in our case, we resort to the inherited version.
Indeed, we find that PSRO with inherits does lack exploration and we also find that the initial policy style-of-play plays a significant role in each PSRO trial. So we run multiple PSRO trials with various initial policies and quickly build up a pool of strategies that can be used in later phases (say League Training).
Feel free to have a look at our paper. In Section 6 we detailed our PBT attempts.
Thanks for your detailed response! It really clears up my confusions.
Many thanks to your hard work! And I have a question.
You mention in competitive.rst that:
In contradiction, in Xu Z, Liang Y, Yu C, et al. Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games, the author claims that:
I wonder whether you have experimented both way and come to that conclusion. I appreciate your assistance and look forward to your reply.