lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.5k stars 564 forks source link

Some basic life and death are still not recognized #797

Open LightFieldController opened 1 year ago

LightFieldController commented 1 year ago

The latest 18b weight is still confused by some basic life and death.

LightFieldController commented 1 year ago

5 4 3 2 1

LightFieldController commented 1 year ago

The above are some basic life and death which are very easy for a beginner but seems very difficult for the latest 18b weight.

LightFieldController commented 1 year ago

@lightvector It seems that katago with current structure is approaching its upper limit power. If it is true, I think a good choice is to fix its basic bug as shown above so as to create a Go god without any weaknesses.

lightvector commented 1 year ago

Yes, when you have such extreme positions, neural nets often don't behave that well. I don't think it's a problem with just the "current structure", it's a challenge with this kind of machine learning in general, as far as I understand. It can be fixed with training on positions "like" this, although creating enough diversity of data takes a lot of work.

Do you have any examples of this kind of evaluation causing weaknesses in real games, the same way that cyclic group evaluation caused some weaknesses in both direct anti-bot play and/or in evaluating certain pro games with cyclic shapes? If it doesn't affect real human games that people wanted to analyze with KataGo, and it is not possible to create such shapes by playing against KataGo directly, then there is much less practical concern about this. It's a lot harder to justify doing a lot of work to fix something if there is no practical benefit.

Of course, if you can provide the game positions yourself, I would be open to adding it to training with an appropriate small weighting even if there were no practical benefit! What would be needed is a set of diverse game positions with these kinds of shapes, or a good way to automatically generate such positions. By "diverse" I mean that the positions need to cover a wide variety of different cases, so that the neural net is forced to learn a general rule rather than learning a simpler incorrect rule or memorizing one position. I think in this case, you would want:

If a dataset of game positions were unique and widely diverse among all the above dimensions, I would guess probably just several hundred positions would be enough for some decent learning, at least to start. That's within the possibility of a dedicated person or group of people to manually create, if you wanted to try. (This is somewhat speaking from experience! I've done a bit of this in my spare time for cyclic groups, since cyclic groups affect real games, and it's been working reasonably. I would welcome it if someone else wanted to do something similar here!)

LL145 commented 1 year ago

Currently, the input features for KataGo do not include the number of eyes a group of stones has (the input features only contain information about liberties, correct me if I'm wrong...), if we consider "Location has stone with {0,1,2,2+} eyes" as an input feature, it may reduce these types of errors? I understand that the number of eyes doesn't determine the life or death of a group of stones, especially under the rule prohibiting repeated positions, but it's a technique that humans use to reduce computational load.

lightvector commented 1 year ago

If you do that, you will probably just end up with a bot has exactly the same huge misevaluations as before, except all you have to do is change the eye shapes to ones not recognized by your hardcoded algorithm.

For example, here the corner contains an eye,

But how about here?

How about here?

Also technically all of ABCD are separate groups - they are not solidly connected.

Do you have to add a lot of manual connection patterns so that you treat this as all one group? If a large dragon has two eyes, but the dragon is in two pieces separated by a bamboo joint, or a table shape, with each piece only having one eye, what do you provide as an input feature then?

If you try to provide an input feature, there are too many patterns - even in the best case, people only have to change a few stones or find a different example with slightly looser shapes, and the exact same misevaluation happens again. Unless the neural net has already learned how to do handle the eye counting itself ignoring the input feature. If it has, then the input feature is not needed anyways.

In the worst case, adding an input feature may make it harder for the neural net to learn to solve these problems on its own. Because if the input feature gives the correct answer in a given example and the net relies on that feature to simply output the answer, then the neural net is not using that example to learn how to look across the group to count the eyes on its own, which might be necessary in a slightly adjusted shape.

lightvector commented 1 year ago

Solving this problem by manually providing training examples is probably easier anyways! Instead of doing complicated code changes that require new versions and releases and such, all you have to do draw a bunch of board positions that contain different misevaluated examples and the training will automatically learn how to handle them. Anyone who plays Go can try to produce a dataset of many different training examples with different interesting cases to learn. Message me if you are someone who wants to try this and we can iterate back and forth on how much diversity needs to be present in the examples and in which situations the examples might need more cases to ensure learning works properly - I have some intuitions about this now after having done some of that work myself in cyclic groups.

michito744 commented 1 year ago

I guess the current Zero clone don't reflects the supreme principle that 'the life and death of a stone is not determined by itself alone, but by its relationship with its surrounding allies and enemies.

lightvector commented 1 year ago

I think that's not the best way to think about it. Diagnosing the problem as having to do with the relationship between "surrounding allies and enemies" is important for life and death isn't a very useful framing. The neural net understands very well that the relationship between stones is extremely important, and has a lot of its capacity devoted to judging that relationship between patterns of black and white stones - that's exactly why it plays so well in normal situations.

So what is the way to think about it? Well, here's a way that is also NOT the correct way to think about it, but I think is a little closer to having a grain of truth, a tiny bit of correctness in it even though part of it is wrong and oversimplified and imperfectly analogous:

Imagine we were playing a different game that is not Go. Let's call this game "Game A". The rules of "Game A" are:

Let's invent yet a different game, called "Game B". The rules of "Game B" are:

In the positions given above, the ones where white doesn't have 2 eyes, if we are playing Go, then black is winning due to killing white's enormous group. However, in both of Game A, Game B, white would be completely winning, instead of black!

Now here's the big question:

How is the neural net supposed to know that it is playing Go, instead of Game A or Game B?

Remember, the neural net is never told the rules! It only learns by example. And in 100% of the examples it remembers seeing during training, the outcome of the game was consistent with all 3 rules. In no game in the training data has there ever occurred a game with a solid 13x13 block of stones. And in no game in the training data that the neural net ever "remembers" has there ever been a group with more than 240 stones on the board that died. (There may be such games in the very earliest random games at the very start of training, but they are too long ago to be remembered). As far as the neural net is concerned, it might be playing game A, or game B, or game C, or game D, or any number of other games with different rules where we could imagine any outcome we like.

Faced with this, the neural net's output is basically arbitrary. It could be very confident in white to win, or in black to win, or anything in between, depending on the particular nature of how it just so happens to extrapolate its data and the way that data behaves in the tails, what its architecture is, and maybe just the random chance of its weights.

For normal positions, the neural net has seen so many millions of examples with similar shapes and groups and so on, that statistically speaking, it does know how that position should likely behave. Yes, maybe you could also design some rules that cause a "normal" position to have a completely non-Go result as well (e.g. the rules are the same as Go, except if the position ever has such and such exact SHA1 hash or this exact whole-board pattern of stones, then declare that white immediately wins), but such rules would have to be so sharp and rare and hard to distinguish. By contrast, the positions posted above are extremely easy to distinguish - they contain massive blocky eyeless or nearly-eyeless groups that look absolutely nothing like the groups that would occur in normal games, so it is very easy to come up with rules that make "positions that look like this" behave differently.

So the neural net is already able to do a lot of computation that is very sensitive to the relationships between stones, or "allies and enemies" if you want to personify the situation that way. But we haven't told it what the rules are (because modern ML doesn't know how to do that, not so that the models "understand" in the way that is needed, not even language models, yet), and we also haven't shown any empirical examples of what the outcome of those relationships should be in positions that "look like" this. So of course, since we haven't told it anything, and the positions are so extreme, we shouldn't be surprised when it returns arbitrary extreme outputs - often that's what mathematical functions do when you plug in random extreme values.

michito744 commented 1 year ago

It was because of that that FineArt was sent to the hospital after fighting with Japanese professional Shibano (ryu), though.

If KataGo wants to avoid learning the rules, he has to keep getting thoroughly beaten up by the anti-zero AI and learn through distorted learning that he cannot win when it comes to this phase.

lightvector commented 1 year ago

Yes, but it's not true that bots "want" to avoid learning the rules. It's simply an unsolved problem in the field of machine learning research - even the best researchers in the world do not know the correct algorithms how to do this in general, how to build models with the right priors to get them to extrapolate out of distribution better. Our only tools right now are just to show them more and more examples every time a problem is noticed.

So I'll reiterate - anyone who wants to help build a dataset of examples for this or any other problems you notice and want to improve, eventually rich enough that it might be usable for training - your help would be welcome!