lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.61k stars 569 forks source link

About stdev #394

Closed hope366 closed 3 years ago

hope366 commented 3 years ago

I think stdev is the standard deviation, but in the case of KataGo, what numerical value is used and what kind of calculation is performed? In the Japanese version of lizzie, "stdev" is referred to as "complexity". Isn't this appropriate? The stdev function of Excel seems to be as follows. Does KataGo also adopt this? 無題 I'm sorry, I have little knowledge of math 😅

lightvector commented 3 years ago

KataGo reports an estimate of the standard deviation of the final score difference of the game that would be achieved under self-play conditions (noisy low-playout games).

Due to the mechanics of MCTS, which I won't get into, this value will also be slightly biased to be a bit too large even beyond the above. So the absolute magnitude of the number is not generally going to be meaningful, but the relative change in it can provide some information. You can think of it as a measure of the total amount of uncertainty in points left in the entire game.

However, "complexity" is not necessarily the best one-word translation of it. It could be, but it depends on what connotations that translation has. As I mentioned above, it's best thought of as loosely corresponding to "total uncertainty in points remaining in the entire game". This means that it will indeed be higher in complex situations, but it will also be higher in situations that are very simple in the short-term if they are early in the game - for example in the middle of a very simple and easy joseki. This is because even if the fight is locally very simple at that moment, since it is early in the game, the total uncertainty left in the entire game would be large.

hope366 commented 3 years ago

Thank you for your polite explanation.

So, in the final phase of the game, where there is little room left for change, this number will report a small value.As you say, even in the middle of a very simple joseki in the early stages of the game, this value is reasonably high, so I kept questioning the word "complexity". Perhaps most Japanese users don't understand the meaning of the number shown as "complexity". Thanks to asking you this question, I'm glad I got an overview of what this number means.

I have been customizing lizzie and offering it to many people. In order to convey the meaning of "stdev" to more people, I'm thinking of changing the Japanese translation of "stdev" in lizzie from "complexity" to "uncertainty" or "standard deviation of estimated score differences". The latter is a bit longer, but I think it is a good description to help more people grasp the meaning of stdev.

hope366 commented 3 years ago

As the attached screenshot shows, according to the Japanese rules, white is 0.2 points lead when there are no stones on the board yet. And the stdev is 17.1. Does this mean that the final difference is 0.2 and is within ± 17.1 with a 95% (68%? Or 98%?) Confidence interval? 無題

lightvector commented 3 years ago

No, it means none of those things, as I said earlier:

So the absolute magnitude of the number is not generally going to be meaningful, but the relative change in it can provide some information.

So 17.1 should not be considered to be a meaningful measure of anything well-defined, by itself. Instead, what is meaningful is if it changes. If it goes up, the total estimated uncertainty remaining the game has gone up, if it goes down, the total estimated uncertainty remaining in the game has gone down.

hope366 commented 3 years ago

Thank you for explaining again. I would like to think carefully about these two things, "total estimated uncertainty" and "relative change is meaningful", and think of a Japanese explanation that conveys the interpretation of stdev to more people.