lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.47k stars 563 forks source link

How to efficiently end the game in the self-play? #709

Open CGLemon opened 1 year ago

CGLemon commented 1 year ago

It is one of my bot's self-play game. The Rule is TT. In this position, KataGo will play double pass to end the game because there is no dead stones.

But my bot will keep playing until Beason.

Why does the KataGo do the pass move precisely?

HackYardo commented 1 year ago

From my superficial view, game is game, rule is rule, your bot follow a rule to play a game, but the bot should not follow the rule at every move and it needs a game judge as well, if it has more scores than its opponent, it could just make no move and pass its turn. Sabaki has an easy game judge, perhaps could help your bot: https://github.com/SabakiHQ/deadstones

CGLemon commented 1 year ago

@HackYardo

Yet, to add some heuristic may improve the pass performance. But it is not safe in the self-play if the method is too simple. It may cause some weird result. I think MCTS with simple random playouts which you propose is a unsafe way.

lightvector commented 1 year ago

KataGo currently adds a very small utility bonus or penalty (e.g. equivalent to 0.5 points or 0.25 points of score) for behaving nicely in the endgame like this. This tiny bonus never affects the actual winner (e.g. if the game was an exact draw, the bonus is added but the game outcome is still considered draw), it only affects the score utility component where KataGo cares about maximizing score. The bonus is also not learned or trained on by the value head or score head in the neural net, it is only used during search to affect the root playout distribution for policy training.

The idea is that in principle, a tiny bonus score like this should not affect the theoretical optimal play. If there is any line of play that gains even 1 point more, then that line will achieve higher utility than behaving nicely. So in the limit of infinite training and optimal play, tiny bonus to behave nicely should, in theory, only affect the moves chosen when there is otherwise no difference between them.

CGLemon commented 1 year ago

Thanks! According to this method, may I just simply add the tiny bonus for the root player if it plays the pass move? The pass move always gets the not-positive point under the TT rule. The bot will play the pass move if the final score of candidate moves are equal.

lightvector commented 1 year ago

You could do that, and it will probably work okay. KataGo does something a little more complex, to try to ensure that dame are filled and protective plays are made first, before passing, but still to not spend time filling in territory uselessly.