Open Ettrig opened 4 years ago
Feel free to submit a PR. It should look something like this:
def getGameEnded(self, board, player):
# return 0 if not ended, 1 if player 1 won, -1 if player 1 lost
# player = 1
b = Board(self.n)
b.pieces = np.copy(board)
if b.has_legal_moves(player):
return 0
if b.has_legal_moves(-player):
return 0
if not b.has_legal_moves(player) and not b.has_legal_moves(-player):
return 0.00001
if b.countDiff(player) > 0:
return 1
return -1
The Coach will then treat it as a draw. It won't add half a point to each but that's mostly ok.
Hmm. This project has great pedagogical value. That is, it is written in a way that is easy to read. I feel that this proposal steps away from that style. It assumes (correctly) that the surrounding code will interpret anything not-zero as True. But Python has a way to state boolean values explicitly (and more understandably), True/False literals. I chose to change the code so that it returns a tuple: True/False for ended and 1/0/-1 for result. This requires changes at a couple of other places. I don't expect you to want that in the project.
Actually I'm referring to the parent Game class:
https://github.com/suragnair/alpha-zero-general/blob/master/Game.py#L62-L72
def getGameEnded(self, board, player):
"""
Input:
board: current board
player: current player (1 or -1)
Returns:
r: 0 if game has not ended. 1 if player won, -1 if player lost,
small non-zero value for draw.
"""
However python can easily return multiple values by quietly packaging them as tuples. Returning a combination values such as (done, score)
could be leveraged here.
I’ll add that the repo was not initially designed with ties in mind, and it was more of an afterthought. See #4 and #61.
Also agree the non-zero value for ties steps away from the otherwise intuitive style. Open to better solutions.
I'd like to see some flexibility around scoring in general. I'm happy to submit a PR to allow:
I think it would be a reasonably small PR. If you like this approach - I can get it done quickly
This would also help to solve the problem in #158
This method in OthelloGame
... returns -1 (opponent won) if there is no legal move and countDiff == 0. But the number of markers can end up being equally distributed between black and white. Wikipedia says that in tournaments this is counted as 1/2 a point for each contestant. In MCTS, this should reasonably be represented as 0. But that value is used for representing non-terminated games.
As a result of this defect, MCTS records v==-1 in the cases where v should be 0. This causes incorrect and therefore inefficient search. The incorrect value is also used as training data, which causes neural net output to be skewed.
MCTS: self.Es[s] = self.game.getGameEnded(canonicalBoard, 1)
Coach: r = self.game.getGameEnded(board, self.curPlayer) if r!=0: return [(x[0],x[2],r*((-1)**(x[1]!=self.curPlayer))) for x in trainExamples]