zxqfl / mcts

Generic, parallel Monte Carlo tree search library
MIT License
65 stars 19 forks source link

Multi-Agent (Players) #3

Closed xtagon closed 5 years ago

xtagon commented 5 years ago

Hi,

Thank you so much for publishing this implementation of MCTS. Using it has helped me greatly in learning how MCTS works and experimenting 👍

Do you have any examples or suggestions for how to handle a multi-player environment? The only example in the repo is a simple counting game, which is a single player. I see that there are player methods, but the example does not show how to use it.

Also, are there any considerations for whether the player moves are simultaneous vs serial per turn? In the example I'm working on, there may be 2 to 8 players, one of whom is the player I'm trying to maximize, and the others are opponents I'm trying to minimize, and all players always make one of their available moves simultaneously once per turn. In other words, my player does not know which of the moves each other player will make, until the next child node has been simulated (assuming I choose randomly what the other player is likely to do).

Thanks in advance!

zxqfl commented 5 years ago

Hi,

You should implement current_player in GameState and interpret_evaluation_for_player in Evaluator. If you have specific questions about how to do this, feel free to ask.

The code can't easily accommodate imperfect information. I'm not too familiar with the literature so I'm not sure whether MCTS is typically used for imperfect information games.

xtagon commented 5 years ago

Thank you! So if I understand correctly, I would need current_player to return whose turn it is given a state, and interpret_evaluation_for_player would (for example) negate the evaluation score if it was the opponent being interpreted?

zxqfl commented 5 years ago

Yep. It's written in this generic way to support >= 3 players.

xtagon commented 5 years ago

For a game like Chess or Go, each player makes a move in turn, so this feature makes sense in that case. For the use case I'm working on, all players report their chosen move each tick (they make a turn simultaneously. The way I'm handling that as imperfect information right now is to just randomly pick a move for each opponent during make_move after applying my own move. This has some drawbacks, but in general it kind of works, because MCTS evens out once it has enough samples.

The problem is that random moves are never what the opponent will actually do, so the evaluation misses a lot of more likely outcomes. What I'd like to do is incorporate available opponent moves into the search tree instead of just picking randomly for them, so that interpret_evaluation_for_player can then minimize scores for good moves for an opponent and maximize good moves for me. But I'm not sure if the current_player workflow works for this, because the moves happen simultaneously. The state does not have a "current player", the move being made in a turn of a playout is always my move, plus >= 1 unknown opponent move.

If I use current_player anyway, and just use my state to advance turns as if they are one by one instead of simultaneous, would that work, and what should I do in make_move? Will make_move run once per current player? And if it does, then what callback should I use to finish updating the state after all players have made their move?

Thanks again for your help :)

zxqfl commented 5 years ago

The sole purpose of current_player is to pass to interpret_evaluation_for_player.

It sounds like you want to model it as a single-player game. This could work, but it's not very principled. A more principled way to handle simultaneous moves is to use an algorithm that converges to a Nash equilibrium.

In general, this library assumes that at each node of the tree there is a best move, which isn't true when there's imperfect information.