lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.45k stars 561 forks source link

Remaining game length as auxiliary prediction target #38

Open TFiFiE opened 5 years ago

TFiFiE commented 5 years ago

Did you consider or even already experiment with having the network predict the remaining number of moves in a game? Game length is admittedly kind of a dubious concept in the game of go (depending on the ruleset), but it's otherwise universal enough for it to be nice to find out whether predicting it might prove (significantly) beneficial. As a bonus, the info would be valuable for use in the time management (especially if the predictions are further divided by outcome (win/loss/draw)).

isty2e commented 5 years ago

I don't think this is a well-defined and sound target, since the game can progress indefinitely without loss of score, and is largely dependent on the number of ko remaining. Might make sense in the Japanese ruleset, however it is still dubious in terms of strength gain nonetheless. Remaining game length seems like a more valid target for chess rather than go.

poptangtwe commented 5 years ago

Maybe that could be used in chess, xiangqi and shogi AlphaZero like engines.

lightvector commented 5 years ago

Thanks for the suggestion! Yes, I've considered such a target, but I don't think it would overall be that interesting in Go. It would definitely be more interesting in games that behave more "sudden-death"-like.

I don't think you even want this prediction for time management, because the endgame can vary quite a bit in length, but unless the game is close to within a single point or two, you can play the endgame fairly quickly regardless of how long it is, meaning that a decent fraction of the time a lot of the variation in this metric is just noise in terms of how you should spend your time.

A more useful value for time management might be the following:

CodeCogsEqn

where V(t) is the MCTS winrate on turn t, and t_0 is the current turn number. That is, you would be predicting the time over which the variance in the game outcome will arrive, summing across all of the game's remaining variance. A low value would indicate either that the game is already almost decided or that the game is about to be decided in the next few moves, such in a uncertain but huge fight that will resolve shortly. A high value would indicate that there is a long game remaining before the outcome is likely to become certain.

Actually, this seems like a pretty neat value to predict! Whereas the number of turns is not likely to be useful for Go, maybe this value would have some value as an auxillary target (and is also completely general across games). I might try this out at some point.

lightvector commented 5 years ago

As a point of reference/intuition, if the MCTS winrate is a martingale, then the following slightly different sum, not weighting by turn number:

CodeCogsEqn2

would be precisely equal to:

CodeCogsEqn3

i.e. the expected amount of variance remaining is entirely determined by the current probability of winning. So weighting by time simply asks in expectation how far this fixed expected amount of variance will occur. MCTS winrate is definitely not a martingale, but an appropriate monotone function of it probably would be very close to one.

(Edit: fixed bug in equation)

poptangtwe commented 5 years ago

@lightvector How about introducing opponent's miss rate and the loss of score as the other auxiliary prediction targets to improve handicap games?

Splee99 commented 5 years ago

Instead of waiting for the opponent to make mistake, it is more important to create tricky game states to invite opponent's mistake. I guess in the term of MCTS, the goal maybe to build a wide search tree and choose the move with wide standard deviation of winrate (instead of LCB). However, special training may also be needed for this structure.