suragnair / alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
MIT License
3.91k stars 1.04k forks source link

All valid moves were masked, do workaround. #168

Closed mikhail closed 4 years ago

mikhail commented 4 years ago

I have a fresh clone of the repo, and I have just read the entire #23 ticket. I'm constantly getting the warning All valid moves were masked, do workaround.

After a few loops I receive an action of -1 and the game breaks (I have an assert statement to cause this). If I don't have this failure then my simulation fails with RecursionError: maximum recursion depth exceeded

My main file has these settings:

  9     'numIters': 20,
 10     'numEps': 30,              # Number of complete self-play games to simulate during a new iteration.
 11     'tempThreshold': 4,        #
 12     'updateThreshold': 0.5,     # During arena playoff, new neural net will be accepted if threshold or more of games are won.
 13     'maxlenOfQueue': 3,    # Number of game examples to train the neural networks.
 14     'numMCTSSims': 10,          # Number of games moves for MCTS to simulate.
 15     'arenaCompare': 2,         # Number of games to play during arena play to determine if new net will be accepte    d.
 16     'cpuct': 1,
 17
 18     'checkpoint': './temp/',
 19     'load_model': False,
 20     'load_folder_file': ('./temp/','checkpoint_9.pth.tar'),
 21     'numItersForTrainExamplesHistory': 1,

I don't really understand the error message, nor the comments in the MCTS file:

# if all valid moves were masked make all valid moves equally probable
# NB! All valid moves may be masked if either your NNet architecture is insufficient or you've get overfitting or something else.
# If you have got dozens or hundreds of these messages you should pay attention to your NNet and/or training process. 

I'm confused by why anything could be "masked." In my game all moves for a player are valid at all times:

def getValidMoves(self, board, player):
    player_turn = getNextPlayer(board)
    if player_turn == player:  # I don't know if this check is necessary
        return [1] * self.getActionSize()  # all moves are valid

    return [0] * self.getActionSize()

Please help understand this issue.

mikhail commented 4 years ago

I found one potential cause for this. I was using a value of inf to mark a board value as invalid. I think this was causing some numpy issues. I've changed it to a real value (5000) way higher than anything possible. The code no longer breaks, but now resembles #27 with RecursionError: maximum recursion depth exceeded error

  File "/...path.../alpha-zero-general/MCTS.py", line 121, in search
    v = self.search(next_s)
  [Previous line repeated 968 more times]

Adding some debugging I saw that getNextState was called with the same (board, player, action) combination over and over again.

rlronan commented 4 years ago

What this means is that all of your values in predict that corresponded to valid moves had values of 0 or NaN. Usually all valid moves being masked means your neural net architecture is messed up, or the training is going catastrophically. Specifically, it may mean your gradients are either exploding to infinity or NaN, or they may be vanishing to 0. You should check the values your network is returning in predict before it is multiplied by valid. You should also double check that getValidMoves is returning an array of 1's not an array of 0's.

What do you mean by marking a board value as invalid? What game are you trying to implement

evg-tyurin commented 4 years ago

The code no longer breaks, but now resembles #27 with RecursionError: maximum recursion depth exceeded error

The game implementation possibly doesn't detect end of game. I'd recommend to check game logic and game end conditions.

mikhail commented 4 years ago

@rlronan, @evg-tyurin I'm trying to simulate shuffleboard/curling/lawnbowling games where board involves analog values rather than a board with squares.

I think I'm misunderstanding the getCanonicalForm(self, board, player): function. The description seems to contradict itself.

The canonical form should be independent of player.

To me this means that the returned value should be the same regardless of whose turn it is. But the example later says When the player is black, we can invert the colors and return the board. How is that possible? Does this mean (in chess example) that first move was taken by the black side? Maybe that works, but I don't understand how that works for games with limited turns. Shuffleboard has 4 pucks per team. This means that Canonical form would show the last puck, 8th, thrown before the 7th puck. I'm pretty confused about this.

Edit: I modified the getCanonicalForm to return flipped board for player=-1 but it did not change anything. I added a debug print line in getGameEnded and it reveals something confusing:

Checking if game ended...
getNextState (player, action) = (1, 0)
Checking if game ended...
getNextState (player, action) = (1, 3)
Checking if game ended...
getNextState (player, action) = (1, 2)
getNextState (player, action) = (1, 0)
getNextState (player, action) = (1, 2)
getNextState (player, action) = (1, 0)
...[repeated forever]...

1) Checking game ended calls stop after a certain point 2) player/action combination start looping infinitely. Sometimes it's alternating between (1,2) and (1,0), other times it's another combination or just (1,1) forever.

What could cause the system to stop invoking getGameEnded ?

evg-tyurin commented 4 years ago

What could cause the system to stop invoking getGameEnded ?

I suppose your implementation doesn't distinguish different game states and MCTS checks game end only once for each state/position. https://github.com/suragnair/alpha-zero-general/blob/master/MCTS.py#L71

Alpha zero approach is not intended for games you named above, it works for games with enumerable set of positions and moves.

mikhail commented 4 years ago

Thanks, @evg-tyurin. I'm not ready to give up yet, but I'll close this ticket.

I started reading the MCTS file and found that caching you referenced as well. I always thought limitation was for actions to be enumerable but game state didn't have to be.

jamesbraza commented 1 year ago

One other possible cause (due to the open-ended <= 0 else condition) is if the policy network is returning negative probabilities.

This can happen if one forgets to exp after a log softmax layer.