suragnair / alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
MIT License
3.86k stars 1.03k forks source link

MCTS max recursion error possibly due to every state visited for custom game #136

Closed SurferZergy closed 5 years ago

SurferZergy commented 5 years ago

I'm implementing a custom game where is similar to chess, but you attack the piece one square over, instead of moving on top of the enemy piece. The game is supposed to have attack range, firepower, movement range ... etc as attributes, but to simply it for first iteration i'm testing it with where pieces can only move 1 and attack 1 (square next you). I initalize the board as: [1,1,1,1,1] [0,0,0,0,0] [0,0,0,0,0] [0,0,0,0,0] [-1,-1,-1,-1,-1]
or [1,1,0,0,0] [1,0,0,0,0] [0,0,0,0,0] [0,0,0,0,-1] [0,0,0,-1,-1]

I'm running into a MCTS max recursion during train, i believe due to not being able to find a leaf node, maybe because every state with the current board is at one point played already, and the game hasn't ended yet. So that s is never not-in self.Ps I notice higher Cpuct or lower numMCTSsims delays the max recursion error from happening.

possible solutions in my head: 1)Is there a way to either set max-tree depth? if so would we just do another self.Ps[s], v = self.nnet.predict(canonicalBoard) and return -v among error? Would this still train the agent in a meaningful way?

2)or can we set game end after max-tree depth? another way to think about this is: How would chess handle a similar situation where each side only has a king left and the game can never end due to running from the other king forever?

ps. I assume we can't just limit the game moves and call it game end after 1000 moves, since the board doesn't represent "game end" in a np.array form? for example how would AI know state [0,1,0,0,0] [0,0,0,0,0] [-1,0,0,0,0] [0,0,0,0,0] [0,0,0,0,0] is a terminal node? and not another state?

Would this still train the agent in a meaningful way?

evg-tyurin commented 5 years ago

Max recursion error highly likely means a problem in the game rules. In this case engine can't detect game end and game continues infinitely. This problem was already discussed in a couple of issues.

suragnair commented 5 years ago

It may be worthwhile trying to detect such states and folding them into terminal states with the outcome as a draw. Would that be possible?

SurferZergy commented 5 years ago

Thanks for the comments and help!

So my question to the draw solution: 1)maybe set a max amount of steps before calling it a draw. would it still train the MCTS in a meaningful way if the end game state is drastically different? for example, go ends when all spaces are taken, chess ends when kings are taken, but here it could have any arbitrary amount of pieces left.

2)If we look at a game of chess, how would we handle a game state where only 2 king pieces are left and the game could theoretically go on forever w/o ending?

3)The way the repo is setup, 1 reward for player 1 win, -1 for -1 player win, 0 for not end. What would be a good design for a draw game? I noticed rts gives a reward of 0.001 on time out, but the board is represented in pieces representation, as opposed to board representation.

SurferZergy commented 5 years ago

perhaps a better way to ask this question: if the MCTS looked like:

S0 / \ S1 S2 / \ / \ S3 S4 S5 S1 <---(terminal states)

So you can get to S1 from S0 as well as S2. If you already visited S1 from S0, S1 is already stored in self.Ps. and now line 76 "if s not in self.Ps" perhaps wouldn't know S0 -> S2 -> S1 is a leaf node. and would try to go to S3/S4. Causing the the infinite max recursion.

SurferZergy commented 5 years ago

Another way to prove game ending logic isn't the problem. I can pit a random player against another random player 10000 times, and every game resulted in the game ending with a winner.

SurferZergy commented 5 years ago

Hi all, thanks for the help, turns out i was wrong and you guys were right, I added a step counter on the board (although tricky since it had to be part of the board as a np array) for game draws. And looks like I can train now. Thanks, and i will close the ticket for now.