lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.47k stars 563 forks source link

There are some trouble with self-play game. #474

Open CGLemon opened 3 years ago

CGLemon commented 3 years ago

I write a parser in order to supervised learning from katago self-play games.

I found that some end game result are wrong. The rule is "koPOSITIONALscoreAREAtaxNONEsui1" (chinese rule).

case1: It is not really wrong. But some dame is not wmoved. (;GM[1]FF[4]SZ[9]KM[7]RU[chinese]PB[black bot]PW[white bot]DT[2021-05-01-00:28:16]RE[0];B[ee];W[ec];B[fg];W[cf];B[dd];W[gd];B[dc];W[be];B[ch];W[bc];B[he];W[ge];B[hf];W[eb];B[db];W[hd];B[gb];W[hb];B[ea];W[fb];B[dg];W[bg];B[bh];W[ba];B[fa];W[gc];B[ga];W[ed];B[de];W[ha];B[da];W[cb];B[fd];W[gf];B[gg];W[fc];B[fe];W[id];B[ag];W[af];B[ah];W[ie];B[hg];W[ef];B[ff];W[df];B[cc];W[eg];B[eh];W[bd];B[cg];W[bf];B[ca];W[bb];B[if];W[ib];B[];W[])

case2: Some dame is not moved. Will influence the final score. (;GM[1]FF[4]SZ[9]KM[7]RU[chinese]PB[black bot]PW[white bot]DT[2021-05-01-00:52:13]RE[W+3];B[ee];W[eg];B[dc];W[gd];B[ff];W[cf];B[fg];W[he];B[fb];W[hg];B[be];W[ce];B[bd];W[eh];B[fh];W[bf];B[ch];W[bh];B[ei];W[dh];B[hc];W[gc];B[gb];W[hd];B[hb];W[ed];B[dd];W[ec];B[eb];W[gf];B[hh];W[fe];B[ef];W[ci];B[de];W[cd];B[cc];W[ih];B[fc];W[fd];B[hi];W[gg];B[ae];W[af];B[bc];W[di];B[fi];W[df];B[id];W[gh];B[gi];W[ie];B[ic];W[cg];B[];W[])

case3: Some dead string are not removed. So the final score is not correct. The correct final score is B+2 in the norm game. Or final score is B+3 if we just remove the dead string. But katago thinks the final score is B+4. (;GM[1]FF[4]SZ[9]KM[7]RU[chinese]PB[black bot]PW[hite bot]DT[2021-05-01-00:28:16]RE[B+4];B[ee];W[ce];B[gf];W[fc];B[dd];W[dg];B[cd];W[eb];B[hd];W[cb];B[bd];W[fg];B[gg];W[eh];B[gb];W[gc];B[hb];W[dc];B[bb];W[ba];B[ab];W[fe];B[ff];W[ed];B[de];W[hc];B[ic];W[ge];B[he];W[fb];B[fh];W[eg];B[bg];W[cf];B[bf];W[ch];B[gh];W[be];B[ae];W[bh];B[ah];W[cg];B[ef];W[bi];B[ad];W[di];B[ga];W[gd];B[da];W[ca];B[fa];W[ea];B[cc];W[ai];B[af];W[ag];B[fi];W[ei];B[df];W[ah];B[hf];W[];B[])

case4: The final score is wrong. The correct final score is B+8. I have no ideal what happened? (;GM[1]FF[4]SZ[9]KM[7]RU[chinese]PB[black bot]PW[white bot]DT[2021-05-01-01:12:02]RE[B+9];B[ee];W[ec];B[fg];W[cf];B[dd];W[gd];B[fd];W[fc];B[ge];W[hd];B[ch];W[ef];B[eg];W[cd];B[dc];W[cc];B[db];W[he];B[be];W[ce];B[cg];W[cb];B[bf];W[da];B[ea];W[dg];B[fe];W[dh];B[ca];W[ba];B[gb];W[gc];B[ff];W[df];B[hb];W[gh];B[hf];W[hg];B[if];W[bh];B[fi];W[da];B[eb];W[ei];B[fh];W[bg];B[eh];W[de];B[di];W[ed];B[fb];W[ci];B[gf];W[ei];B[ca];W[ig];B[hc];W[da];B[di];W[ca];B[ei];W[bi];B[ie];W[af];B[ih];W[cg];B[id];W[ag];B[ed];W[bd];B[gi];W[];B[])

case4: It is caused by complex position. This is very seldom. I miss the SGF file. I will post it if I find it.

All the cases are small part of whole games. They are less than 1%. They are almost no negative effects for training. But most case can be deal with random playout ( Just move dame, seki, escape, capture moves), don't need Network.

lightvector commented 3 years ago

Thanks. Where are these games from? Are they from katagotraining.org, or are they from the g170 data, or are they games you generated yourself? Can you say what networks were used to generate them, or what point of the training run they come from?

CGLemon commented 3 years ago

Sorry, I seemed to miss something. KataGo Version: 1.8.0 Network Hash: kata1-b40c256-s5713160192-d1375633552 Game sourse: generated by myself. Rule: koPOSITIONALscoreAREAtaxNONEsui1

I want to generate some 9x9 games in order to supervised learning with my go engine.

CGLemon commented 3 years ago

Here is setting. Based on "selfplay1.cfg"

selfplay-test-fixed-komi.cfg.gz

lightvector commented 3 years ago

Thanks for the more info. Yeah I agree this is a little funny, but it doesn't seem particularly worrying to me, I don't think there's anything clear to be fixed here. Yes, a very small fraction of the time the games will include moves that are slight mistakes, particularly if the difference in value is not very large. But that's true of the rest of the game too, the opening and midgame will also include mistakes, and those mistakes will also affect the final score of the game.

"Random playouts" may be tricky to avoid introducing bias or cause problems, because you then need to implement a whole new set of logic, plus you have to carefully make it not collapse sekis, and handle things like double-ko-seki correctly. And it wouldn't necessarily correctly handle all the possible rulesets.

If you'd like to reduce the frequency of these mistakes, you can try using more than 100 cheap and reduced search playouts so that end of game estimates are more reliable, or increase the score utility so that small score differences are cared about a lot more by the search, or decrease the temperature. All of these things should help.

CGLemon commented 3 years ago

Thank you for your suggestion. But I still some question for KataGo. First, How does KataGo compute final position. I saw the code include pass-alive (Benson's algorithm).But I have no ideal how KataGo work, seem that KataGo is mush much complex than the basic algorithm.

Second, when I see the KataGo self play, It play very well on Japanese rule. It know that when does it play the pass move. I am very surprised that it can do that. Is any special heuristic search?

lightvector commented 3 years ago

This page describes the full mathematically rigorous ruleset that KataGo implements: https://lightvector.github.io/KataGo/rules.html

This ruleset is carefully designed so that if you use the parameters in this ruleset that most closely correspond to Japanese rules, then optimal play under this ruleset will automatically give you correct passing behavior and result in the correct Japanese rules score without any special search. Because the Japanese rules are very very hard to formalize, there are still some rare cases where KataGo's rules are known to differ, but I think it's okay for it not to be perfect, as long as it handles 99%+ correctly.

lightvector commented 3 years ago

Oh, and Benson's algorithm is mostly just used to force an end to self-play games slightly faster if one side refuses to pass. It's not very important once the net has learned to play better. You can see a reference to how it is used ("SelfPlayOpts") in the rules document above.