suragnair / alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
MIT License
3.9k stars 1.04k forks source link

Four Player Game #133

Closed JoelNiklaus closed 4 years ago

JoelNiklaus commented 5 years ago

Hi,

I am working on an AI for a four player card game and would like to apply this repo to it. Since you only support two-player games so far, I would like to extend it to the four-player case. Do you think this is possible within reasonable time? If yes, could you please point me to the main points where the changes would have to happen? The getSymmetries() in the Game probably would have to disappear.

Cheers, Joel

rubencart commented 5 years ago

Second this, generalizing this to N-player games would be nice!

51616 commented 5 years ago

This repo is quite generic board game reinforcement environment so I think the main part to be changed is the game environment and probably the tree search since, as i understand, there's no need for that in most card games. I think you can get it to work within a month or so from my experience playing with the code.

rubencart commented 5 years ago

I do think you could use a search tree for a lot of card games, like games where players have cards in their hands and get to choose which one they play in each turn. I'm not sure how you could fit the incomplete knowledge thing (not knowing your opponents hands, so, his options) into building the search tree however...

51616 commented 5 years ago

@rubencart That's my point since you don't have complete knowledge of the opponents' cards you can't use MCTS unless you change the look ahead algorithm which i don't think would be finished within reasonable time.

henry-p commented 5 years ago

One hint: instead of building a "canonical form" of the board position, you could represent the players as different channels of the position (if you think of the position as an image). Each channel would be a complete board that denotes only the pieces of one player, where the rest of the entries in that channel is 0.

rubencart commented 5 years ago

@51616 Mmm okay yes I see, it isn't so straightforward.

But what if you extend this to non-deterministic games, e.g. Monopoly where you have to throw a dice. That should be possible using only a slightly modified version of the current tree search right? So if you model an opponents turn in a card game as a stochastic sample from the cards left in the game or something like that, it should be possible using a version of tree search relatively close to the current one right? Or, at every TS iteration before you start at the root node, you deal all cards left in the game (minus the ones in your own hand) to the other players and then start the search, now with complete knowledge (complete, but probably incorrect).

I'm just thinking how you could still leverage some of the advantage TS offers on top of just using probabilities coming from the neural net. It probably won't be as powerful as with complete knowledge, but might be worth trying. Let me know what you think :)

davidson16807 commented 5 years ago

@rubencart That's my point since you don't have complete knowledge of the opponents' cards you can't use MCTS unless you change the look ahead algorithm which i don't think would be finished within reasonable time.

I think MCTS is still applicable to games with imperfect information. MCTS only needs to provide action likelihoods that indicate ideal play. If you're playing poker, MCTS would have complete knowledge of game state and the action likelihoods it outputs would indicate how an agent would play if he were cheating. The agent could then be trained to map ideal play to subtle cues from observation. He could learn to "call a bluff", so to speak.

rubencart commented 5 years ago

@davidson16807 Mmm I think you're right... for training anyway. We'd have to try and see :). And at inference time you'd use a version of TS like I described? Or just the policy network output? Because if you want to actually use the algorithm to play fair games you can't use TS with complete knowledge of course :).

davidson16807 commented 5 years ago

at inference time you'd use a version of TS like I described?

By "inference time" you mean playing without seeing the opponent's cards, right? Personally, I would just use the policy network. If it's trained well enough, it should be just as good as the TS. If anything it should be better, since it can presumably call bluffs. But that's just theory right now.

I like the idea of using MCTS to model uncertainty, though. I'm trying to use RL for a single player game that features imperfect knowledge of the weather. Right now I'm handling it like the poker example, but I've been thinking about modeling it as an asymmetric two player where the weather features as an agent.

I guess that means if you really want to think big, you could generalize the algorithm to work with asymmetric N-player games. :D

51616 commented 5 years ago

My thought now is you can use the same MCTS algorithm but make the getNextGameState function stochastic so each time you take an action the results could vary.