lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.49k stars 564 forks source link

A question about search initialization for dynamic score utility #929

Closed sbbdms closed 5 months ago

sbbdms commented 5 months ago

Hi.

Recently I try to read the code about searching in KataGo.

In my current understanding, after one of the players plays a move, KataGo updates its "recentScoreCenter" in computeRootValues() function, and recursively updates all stats in the tree via recursivelyRecomputeStats() function, which uses all the amount of threads (Which is set in config files) to recursively walk over all nodes in the tree. Only after that, KataGo starts running new playouts with the new stats including new dynamic score utility, which affects the playout distribution of search.

However, I noticed the comment below which states oppositely to my understanding:

https://github.com/lightvector/KataGo/blob/cf75cc7dc21ee219bea8e60d869c142332c1b57c/cpp/search/search.cpp#L629-L633

In my understanding, KataGo only runs new playouts after all the stats are updated. But this comment emphasizes the word "would" which seems to state that updating stats and running new playouts would do simultaneously, so KataGo clears the search when a specific situation happens.

So I wonder what is the actual sequence between "updating all stats" and "running new playouts". (1) Do they do simultaneously, or "running new playouts" could only run after "updating all stats"? (2) If the former, how could I change "running new playouts" to only run after "updating all stats"?

Thanks!

lightvector commented 5 months ago

KataGo only runs new playouts after updating stats. The word "would" refers to the fact that we are choosing to clear the search in this case rather than updating stats, i.e. we would update all the stats (before running playouts), except that we entirely clear the search tree instead.

sbbdms commented 5 months ago

Thanks for your answer! However sorry that I forgot to mention that I am not only confused by "would" but also the sentence after it:

but the problem is the playout distribution will still be matching the old probabilities without a lot of new search, so clearing ensures a better distribution.

I notice that getPatternBonus() is called when updating the stats of the nodes:

https://github.com/lightvector/KataGo/blob/4dfed3ebc9dd289f52c5cb81de45bfd40af8478d/cpp/search/searchupdatehelpers.cpp#L313

If all the stats would be updated before a playout is run, including the avoidRepeatedPatternUtility which would be updated into the utility during the update, why the playout distribution is considered to match the old probabilities without a lot of new search? This is the reason why I wrongly thought that updating stats and running new playouts would do simultaneously.

lightvector commented 5 months ago

Suppose that move A has 1000 visits but move B only has 1 visit, but the pattern bonus is configured so that move A is extremely penalized and should barely be searched at all and move B should get a huge bonus.

If we don't clear the search tree, then even after updating the utilities, it will still be the case that move A has 1000 visits and B has only 1 visit, and the evaluation of the parent will incorrectly be biased by the heavily penalized evaluation of A, because A has a lot more visits. It will take at least 999 more visits for B to catch up and become the most visited move.

If we clear the search tree, then when redo the search, instead B will be preferred from the very start and won't have to do a lot of visits to catch up.

sbbdms commented 5 months ago

Thanks again! It's a lesson to me that visits/weights would affect the playout distribution...