Open keldor314 opened 3 years ago
The problems you mention are probably not actually problems - I suspect you are making a questionable inference from weak evidence. The black hole bot on CGOS has been discussed before in other places, and there is probably no major issue to be resolved here.
Why do I say there is probably no problem and that the inference is questionable? A few things:
It's not a strong opening. It's almost certainly a weak opening. Consider: if instead the black hole bot operator had uploaded a bot that always makes its first move on the 2-2 point, and still won despite that, you probably wouldn't have made this post claiming that the 2-2 point is either a blind spot or a genuinely good opening, or that KataGo doesn't "understand" the 2-2 point, right? :) \ I'm pretty sure KataGo should be able to give itself 2 stones if run with sufficiently stronger hardware than just a single midrange GPU. Additionally, someone might also be using a stronger bot. For example we know likely Fine Art and a small number of other close-source bots are a bit stronger, in some cases due to simply having more research and/or resources invested in their development. Obviously, the disadvantage of the black hole opening is less than 2 stones - any 4 stones reasonably spaced out in the open and not obviously overconcentrated is never going to be worse than passing 1-1.5 times. And if 2 stones handicap is already achievable just with hardware without even resorting to a stronger bot, then winning in spite of the black hole opening, a lesser disadvantage, should not be a big challenge at all.
You can see from http://www.yss-aya.com/cgos/19x19/bayes.html that actually KataGo on a single 2080ti has done fine against it. It's only lower-end GPUs or older KataGo versions that have had trouble. Which is pretty much ballpark what you'd expect if the black hole bot is simply running on somewhat stronger hardware, or is using a somewhat stronger non-public engine or network.
Thing similar to some of your suggestions have already been part of KataGo for a long time. Take a look at appendix D of the paper - https://arxiv.org/pdf/1902.10565.pdf. High-temperature initialization, and two methods of branching both help. KataGo probably understands how to play against early high moves pretty reasonably considering that one shouldn't put overly much focus into training on positions where the other side has greatly misplayed. More recent changes since the paper, such as asymmetric playout training ('PDA' parameter), also help. Also, I don't know if you're aware, KataGo also already does predict score variance.
On the topic score variance, of I've already tried playing with it, and I've found it a bit difficult to come up with anything useful. It's also a bit muddled by how MCTS works, and the fact that the neural net often doesn't know what it doesn't know (it's not a generally effective way to find blind spots), and the fact that it's sort of the wrong metric if you're interested in "misevaluations" because earlier in the game it's massively dominated by 'unlearnable' variance rather than learnable variance. If you run some real experiments and find something useful to do with it though, let me know. I'd be interested.
And yes, there are still issues with blind spots and insufficient "exploration" in the basic AlphaZero loop, but I'm reasonably sure that they have little relation to the situations mentioned in this thread, and will not be solved by score variance or the forced-playing of unusual and likely-bad moves.
Anyways, thanks still for the suggestions - I'm glad you're interested. Let me know what you think. There's a lot of people who take one look at CGOS and see the black hole bot at the top (although actually it's not even at the top right now), and who jump to conclusions without thinking even as carefully as you've tried to - especially, people who aren't experienced on the AI side of things. I'm hoping you and any others who read this thread might each potentially be one fewer such person.
I also don't know anything about blackhole_v6 bot. I have tested some blackhole opening.
Some game records kata145b30holev800 ... blackhole(5,7) http://www.yss-aya.com/cgos/19x19/cross/kata145b30holev800.html kata145b30_SQ_v800 ... blackhole(7,7) http://www.yss-aya.com/cgos/19x19/cross/kata145b30_SQ_v800.html kata145b30_SC_v800 ... Southern Cross http://www.yss-aya.com/cgos/19x19/cross/kata145b30_SC_v800.html
Currently, the strongest AI on CGoS is one that exclusively uses the black hole opening, beating KataGo something like 70-90% of the time. But is it a blind spot in Katago that makes it unable to handle the position, or is it a genuinely a strong opening? In either case, Katago doesn't seem to understand it.
This makes sense when you consider that KataGo always plays the move it thinks is best, and thus rarely experiments in its self play games. But how can we improve this?
One super easy idea to implement would during training to present it with board positions where a few stones have already been placed completely at random, and have it play out a game from there. This is of course very likely to result in one side or the other starting at a large disadvantage, but since Katago tries to guess final scores, and then plays the move with the best score, it shouldn't be bothered by this. The idea is that these random stones will disrupt the opening, and Katago will have to learn how to play to make use of them. Hopefully this will not only help it understand unusual positions, but it may even find that some of these positions have good results that Katago would have never found otherwise.
Another idea that would be a bit more complex would be to have a training target where it tries to not only determine the score a board position will result in, but also its own margin of uncertainty. It's a bit tricky to understand how one would actually train this, since we'd prefer to avoid attempting to brute force it directly by having it, for instance, start with a position and play 100 games to calculate the average error.
However, I think we can do better than this. Suppose we only take one sample, namely the game being played. Now, it will of course be random whether its variance prediction reflects the exact final score, since it is after all a measure of how wrong it expects to be. However, if we reward it when the actual error is close to the estimated variance, and punish it when the result is too accurately or not accurately enough predicted, we might hope for this randomness to average out over the course of many games, with Katago finding the point where half the games it predicts the score more accurately than expected, and half it predicts less accurately than expected.
The purpose of calculating variance at all is to allow it to identify moves that it doesn't understand, and then during training, it can occasionally be told to play such a move and find out what happens, even if it doesn't think the result is actually all that good. It also could help in handicap games, where instead of just trying to maximize score and smoothly coast to a result, whether or not that results in a win, it might attempt to find uncertain moves when it's in a loosing position, where the variance is large enough for there be a reasonable chance of crossing the victory threshold, rather than moves with the best expected result, but which are not good enough to win, and it's sure of this. The idea is that this will result in it picking a fight in which even it isn't certain of the result, where the entire game is in the balance. This is of course where a stronger player can turn a loosing game around when the weaker player inevitably makes a mistake.