Open mdm opened 4 years ago
I think I more or less found out what the problem is: When a round arrives at an existing node where the game is already over, there are no more legal moves (not even passing or resigning). But the code attempts to make a move in order to create a new child node. The code does not handle this situation. I also wonder how the algorithm is supposed to work in this situation, should the "newly discovered but not really" node be recorded by traversing the tree upwards (this could skew the results) or should this round be ignored (this could lead to selecting the same node over and over again)?
while node.has_child(next_move): node = node.get_child(next_move) next_move = self.select_branch(node)
The above code does not assume that there could be zero legal moves (which is the case when the game is already over at that point).
@maxpumperla @macfergus Get same problem, trying to fix it. Please review the code in zero_demo.py. Tks!
@HaodongLi1029 can you please stop spamming various issues here? In the other issue @macfergus advised to check out the chapter_14
branch, how about you start with that?
Hello all, @DrVecctor was correct. The simplest fix is to make pass a legal move even after the game is over. This means the MCTS could continue to read out a branch after the end of the game, which is inefficient, but shouldn't really affect its decisions (once the value network has trained a bit)
Pull in this diff https://github.com/maxpumperla/deep_learning_and_the_game_of_go/commit/6148f57eb98e4c75b102d096401efe780e911442 to get the fix. This diff also makes goboard_fast
consistent with goboard.py
After applying the above fix, the error still appears. Should we make it return pass, if no more legal moves?
PS, It seems after adding pass turn as legal moves, the error disappeared.
Running
zero_demo.py
crashes with the following error:My guess is the agent runs out of legal moves before it realizes the game is over.