lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.6k stars 569 forks source link

Negative winrate from lz-analyze #532

Open npnkhoi opened 3 years ago

npnkhoi commented 3 years ago

I got a negative winrate by running lz-analyze for a position. According to the GTP extension doc, winrate must be in the range [0; 10000]. So this should be a bug? Any help is much appreciated!

How to reproduce (I use default config in default_gtp.cfg):

./katago gtp -model g170e-b20c256x2-s5303129600-d1228401921.bin.gz 

boardsize 9

play B H2 
play W E8 
play B C9 
play W F9 
play B A3 
play W C7 
play B J9 
play W pass 
play B H9 
play W H4 
play B F8 
play W F7 
play B H6 
play W A8 
play B F1 
play W D6 
play B E1 
play W B4 
play B B3 
play W pass

lz-analyze 200 minmoves 10
# wait for 2 seconds
stop

In the first output of lz-analyze, many winrates have negative values, shown below:

info move G5 visits 5 winrate 12 prior 5395 lcb -827 order 0 pv G5 D3 E4 C3 info move G4 visits 2 winrate 10 prior 2955 lcb -9990 order 1 pv G4 C3 info move F6 visits 1 winrate 28 prior 494 lcb -9972 order 2 pv F6 info move G8 visits 1 winrate 24 prior 167 lcb -9976 order 3 pv G8 info move F4 visits 1 winrate 31 prior 157 lcb -9969 order 4 pv F4 info move F5 visits 1 winrate 26 prior 120 lcb -9974 order 5 pv F5 info move G7 visits 0 winrate -466 prior 104 lcb -466 order 6 pv G7 info move G6 visits 0 winrate -466 prior 69 lcb -466 order 7 pv G6 info move D4 visits 0 winrate -466 prior 66 lcb -466 order 8 pv D4 info move E4 visits 0 winrate -466 prior 48 lcb -466 order 9 pv E4
lightvector commented 3 years ago

I think I understand why this is happening - good catch, thanks.

Notice that what's going on is that there are 0 visits on that move. So the evaluation is catching the search at a moment when a playout has begun for that move but has not yet completed, so there is no value for it. By default, the code fills in the value based on the first-play urgency of that move (the same number as is used for the PUCT formula), but unlike pure winrates which are never out of range, FPU might be out of range.

This is a bug, I'll fix it to clamp the reported value to within the allowed range.

lightvector commented 3 years ago

And to be clear: currently, I believe the only time you should ever see this is in the case where a move has 0 visits.

npnkhoi commented 3 years ago

Ahh, 0 visits -- I didn't notice that. Thank you and hope this will be fixed soon. P/s: This is the first time in my life I've raised an issue like this on Github, and you made it enjoyable. So, thank you @lightvector.

npnkhoi commented 3 years ago

If a negative winrate is just an artifact of an ongoing playout, I guess it will very soon be normal as soon as the playout finishes. After some quick experiments, I see that it takes totally ~ 35K visits for KataGo to actually visit each move at least once (which is 45 mins on CPU-only engine). Here I define unvisited moves by the 0-visit output of lz-analyze.

So, why does it take so long to visit all moves? @lightvector

katago_left_nodes

lightvector commented 3 years ago

Because most of the moves are very bad, so it is not worth thinking about them. For example, it is pointless on the first turn of the game to spend more than an microscopic fraction of compute time considering moves on the first line, because those moves are never going to be good moves.

lightvector commented 3 years ago

I pushed a fix to master here, if you're able to compile and run from master, then you can test it out! In addition to clamping winrate, we just also avoid reporting 0-visit moves at all if possible, since their stats are likely to be weird. https://github.com/lightvector/KataGo/commit/dd002fa5684c20e12eaae10eada8afe1b8ccf56f

Otherwise, you can wait for next release, which may be sometime in a few weeks.

lightvector commented 3 years ago

Note that you will still get 0-visit moves in the case you ask for minMoves > 0 and KataGo didn't examine that many moves. As I alluded to above, for such moves, KataGo did not look at them even once, the only reason they are being reported is specifically because you asked for more moves than KataGo searched. You should of course be wary of trusting any of the statistics on such moves (score, winrate, etc), because the move is likely to be bad, but also the stats will mostly not be very trustworthy either, because they are based on blind estimates like FPU rather than true statistics about the move.

The only reason KataGo implements minMoves feature at all is to be compatible with Leela Zero's implementation of that GTP command, but mostly you shouldn't be using it at all unless you know what you are doing because of how the stats will not be reliable. The main use of such a command is if you want to see things like the policy prior on unsearched moves without having to separately call kata-raw-nn.