lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.49k stars 564 forks source link

Training Diversity #329

Open keldor314 opened 3 years ago

keldor314 commented 3 years ago

Currently, the strongest AI on CGoS is one that exclusively uses the black hole opening, beating KataGo something like 70-90% of the time. But is it a blind spot in Katago that makes it unable to handle the position, or is it a genuinely a strong opening? In either case, Katago doesn't seem to understand it.

This makes sense when you consider that KataGo always plays the move it thinks is best, and thus rarely experiments in its self play games. But how can we improve this?

One super easy idea to implement would during training to present it with board positions where a few stones have already been placed completely at random, and have it play out a game from there. This is of course very likely to result in one side or the other starting at a large disadvantage, but since Katago tries to guess final scores, and then plays the move with the best score, it shouldn't be bothered by this. The idea is that these random stones will disrupt the opening, and Katago will have to learn how to play to make use of them. Hopefully this will not only help it understand unusual positions, but it may even find that some of these positions have good results that Katago would have never found otherwise.

Another idea that would be a bit more complex would be to have a training target where it tries to not only determine the score a board position will result in, but also its own margin of uncertainty. It's a bit tricky to understand how one would actually train this, since we'd prefer to avoid attempting to brute force it directly by having it, for instance, start with a position and play 100 games to calculate the average error.

However, I think we can do better than this. Suppose we only take one sample, namely the game being played. Now, it will of course be random whether its variance prediction reflects the exact final score, since it is after all a measure of how wrong it expects to be. However, if we reward it when the actual error is close to the estimated variance, and punish it when the result is too accurately or not accurately enough predicted, we might hope for this randomness to average out over the course of many games, with Katago finding the point where half the games it predicts the score more accurately than expected, and half it predicts less accurately than expected.

The purpose of calculating variance at all is to allow it to identify moves that it doesn't understand, and then during training, it can occasionally be told to play such a move and find out what happens, even if it doesn't think the result is actually all that good. It also could help in handicap games, where instead of just trying to maximize score and smoothly coast to a result, whether or not that results in a win, it might attempt to find uncertain moves when it's in a loosing position, where the variance is large enough for there be a reasonable chance of crossing the victory threshold, rather than moves with the best expected result, but which are not good enough to win, and it's sure of this. The idea is that this will result in it picking a fight in which even it isn't certain of the result, where the entire game is in the balance. This is of course where a stronger player can turn a loosing game around when the weaker player inevitably makes a mistake.

lightvector commented 3 years ago

The problems you mention are probably not actually problems - I suspect you are making a questionable inference from weak evidence. The black hole bot on CGOS has been discussed before in other places, and there is probably no major issue to be resolved here.

Why do I say there is probably no problem and that the inference is questionable? A few things:

And yes, there are still issues with blind spots and insufficient "exploration" in the basic AlphaZero loop, but I'm reasonably sure that they have little relation to the situations mentioned in this thread, and will not be solved by score variance or the forced-playing of unusual and likely-bad moves.

Anyways, thanks still for the suggestions - I'm glad you're interested. Let me know what you think. There's a lot of people who take one look at CGOS and see the black hole bot at the top (although actually it's not even at the top right now), and who jump to conclusions without thinking even as carefully as you've tried to - especially, people who aren't experienced on the AI side of things. I'm hoping you and any others who read this thread might each potentially be one fewer such person.

yssaya commented 3 years ago

I also don't know anything about blackhole_v6 bot. I have tested some blackhole opening.

  1. KataGo vs KataGo, blackhole winrate is 18%(100 visits/move. 678games).
  2. KataGo vs KataGo, blackhole winrate is 13%(800 visits/move. 389games).
  3. KataGo(blackhole) vs LeelaZero, blackhole winrate is 38%. (563 games for 400 visit LZ vs 100 visit KataGo, 556games(KataGo blakchole)). It looks like blackhole is not so bad against another program.
  4. blackhole(square, (7,7)) is similar strength to blackhoke(5,7).
  5. KataGo tends to resign around 16-20 moves in blackhole opening. So I set resignConsecTurns = 70.
  6. SouthernCross opening is similar strength.

Some game records kata145b30holev800 ... blackhole(5,7) http://www.yss-aya.com/cgos/19x19/cross/kata145b30holev800.html kata145b30_SQ_v800 ... blackhole(7,7) http://www.yss-aya.com/cgos/19x19/cross/kata145b30_SQ_v800.html kata145b30_SC_v800 ... Southern Cross http://www.yss-aya.com/cgos/19x19/cross/kata145b30_SC_v800.html