suragnair / alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
MIT License
3.74k stars 1.01k forks source link

Possible bug #263

Open pavolkacej opened 2 years ago

pavolkacej commented 2 years ago

I believe this commit causes regression: https://github.com/suragnair/alpha-zero-general/commit/28331bbc48d96c2fecd0683266f76d92ca33c62d

It does not make sense to multiply result of getGameEnded with currentPlayer. After some debugging, I believe this should be changed to original. Please see print few lines above: https://github.com/suragnair/alpha-zero-general/blob/master/Arena.py#L63

If you try printing this line after every game, you will find out, that the sums of won / lost games, compared to prints will not be the same.

I am speaking about othello - which is currently used in Main. Maybe some games may need it the way it is, but, just logically this makes bad winning counts during the pitting with previous version. We want to compare the result always based on white player (player 1), so it is uniform. Note that board is used (which is never mirrored, compared to canonicalBoard), so it makes sense to put 1 rigidly as input to getGameEnded()

rlronan commented 2 years ago

I believe this issue is from the Othello implementation. Please see: getGameEnded definition.

In game.py the specification for getGameEnded is:

 def getGameEnded(self, board, player):
        """
        Input:
            board: current board
            player: current player (1 or -1)
        Returns:
            r: 0 if game has not ended. 1 if player won, -1 if player lost,
               small non-zero value for draw.
        """ 

But, Othello is implemented with a different function specification: " return 0 if not ended, 1 if player 1 won, -1 if player 1 lost".

However, that print result you reference may be incorrect, but I would need to think on that more.