Fictitious Self-play and Self-play documentaion needed

sjtu-marl / malib

A parallel framework for population-based multi-agent reinforcement learning.

MIT License

499 stars 60 forks source link

Hi, thanks for providing such a powerful open-source framework. This is an awesome repo for MARL training and is what we needed the most to implement some distributed training algorithm without building a framework from scratch. While I appreciate how easy to train agents with the PSRO algorithm using this repo, I would like to see more training algorithms coming in the future. I was reading the paper of your work, and I find the following sentence in that paper:
"In the initial implementation, we provided three PB-MARL algorithms support, they are Policy Space Response Oracle [27] (PSRO), Fictitious Self-play [8] (FSP), Self-play [9] (SP) and Population-based Training [14] (PBT)" (Doesn't that make four rather than three algorithms already?) If my understanding was right, you guys have already integrated the training algorithms such as Fictitious Self-play in this repo now, right? However, I could not find any documentation about how to use the FSP or PBT algorithm to train my agents. Would you guys mind to provide more examples of using other training algorithms in the future? Thanks very much.

Hi, thank you so much for your suggestions. I'm very glad to answer your questions.

For PSRO, FSP and SP, there is a little difference between them, which is the solution of a meta-game. Generally speaking, the current framework structure can support different algorithms as long as they have almost the same executing steps:
1. interactions between every single policy in the population, which may be maintaining a meta-game;
2. solving the population based on the results of interactions you do in the first step;
3. extend your population with more powerful/diverse policies which can be running a best response(BR) algorithm against the current solution of the population. We've already implemented this loop with good scalability and you can customize each part of it to create new algorithms.
For more examples and documentation in detail, they are actually on our current plan. For algorithms like FSP and SP, they are implemented and tested in our dev branch. We'll provide them in current updates.

sjtu-marl / malib

Fictitious Self-play and Self-play documentaion needed #19