suragnair / alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
MIT License
3.83k stars 1.03k forks source link

Tied is treated as win for opponent #197

Open Ettrig opened 4 years ago

Ettrig commented 4 years ago

This method in OthelloGame

def getGameEnded(self, board, player):
    # return 0 if not ended, 1 if player 1 won, -1 if player 1 lost
    # player = 1
    b = Board(self.n)
    b.pieces = np.copy(board)
    if b.has_legal_moves(player):
        return 0
    if b.has_legal_moves(-player):
        return 0
    if b.countDiff(player) > 0:
        return 1
    return -1

... returns -1 (opponent won) if there is no legal move and countDiff == 0. But the number of markers can end up being equally distributed between black and white. Wikipedia says that in tournaments this is counted as 1/2 a point for each contestant. In MCTS, this should reasonably be represented as 0. But that value is used for representing non-terminated games.

As a result of this defect, MCTS records v==-1 in the cases where v should be 0. This causes incorrect and therefore inefficient search. The incorrect value is also used as training data, which causes neural net output to be skewed.

MCTS: self.Es[s] = self.game.getGameEnded(canonicalBoard, 1)

Coach: r = self.game.getGameEnded(board, self.curPlayer) if r!=0: return [(x[0],x[2],r*((-1)**(x[1]!=self.curPlayer))) for x in trainExamples]

mikhail commented 4 years ago

Feel free to submit a PR. It should look something like this:

def getGameEnded(self, board, player):
    # return 0 if not ended, 1 if player 1 won, -1 if player 1 lost
    # player = 1
    b = Board(self.n)
    b.pieces = np.copy(board)
    if b.has_legal_moves(player):
        return 0
    if b.has_legal_moves(-player):
        return 0
    if not b.has_legal_moves(player) and not b.has_legal_moves(-player):
        return 0.00001
    if b.countDiff(player) > 0:
        return 1
    return -1

The Coach will then treat it as a draw. It won't add half a point to each but that's mostly ok.

Ettrig commented 4 years ago

Hmm. This project has great pedagogical value. That is, it is written in a way that is easy to read. I feel that this proposal steps away from that style. It assumes (correctly) that the surrounding code will interpret anything not-zero as True. But Python has a way to state boolean values explicitly (and more understandably), True/False literals. I chose to change the code so that it returns a tuple: True/False for ended and 1/0/-1 for result. This requires changes at a couple of other places. I don't expect you to want that in the project.

mikhail commented 4 years ago

Actually I'm referring to the parent Game class:

https://github.com/suragnair/alpha-zero-general/blob/master/Game.py#L62-L72

    def getGameEnded(self, board, player):
        """
        Input:
            board: current board
            player: current player (1 or -1)
        Returns:
            r: 0 if game has not ended. 1 if player won, -1 if player lost,
               small non-zero value for draw.

        """

However python can easily return multiple values by quietly packaging them as tuples. Returning a combination values such as (done, score) could be leveraged here.

suragnair commented 4 years ago

I’ll add that the repo was not initially designed with ties in mind, and it was more of an afterthought. See #4 and #61.

Also agree the non-zero value for ties steps away from the otherwise intuitive style. Open to better solutions.

mikhail commented 4 years ago

I'd like to see some flexibility around scoring in general. I'm happy to submit a PR to allow:

  1. Scoring values other than win 1 / lose 1
  2. Scoring exact 0 -- draw
  3. Optional cumulative score across all games
  4. Both players scoring something in the same round

I think it would be a reasonably small PR. If you like this approach - I can get it done quickly

goshawk22 commented 4 years ago

This would also help to solve the problem in #158