strengthen the exploration of the opening

miracond commented 2 years ago

Have you ever considered using the opening book instead of random select to train the opening of the GO game, and deciding which opening to train according to the uct algorithm?

The number of visits of the opening uct node shall be caculated in all training games, and can be reset for each new weight training, or use some kind of decay algorithm.

This can strengthen the exploration of the opening

lightvector commented 2 years ago

That's an interesting idea. Are you proposing this mainly because you're interested in solving small boards like 9x9 or 10x10?

miracond commented 2 years ago

In fact, I'm not interested in a small GO board. I just want to see whether AI can find more interesting moves in the 19 way GO board we usually played. The current AI start feels a little monotonous.

Usually, when training go AI, the number of simulations is only thousands, which may be difficult to go beyond the local best points to find the good start of in-depth calculation. However, by incorporating the opening part of all training into a large uct tree, it is believed that more opening techniques can be found.

fuhaoda commented 2 years ago

The training started from completely random positions and converged to the current starting positions.

Get Outlook for iOShttps://aka.ms/o0ukef

From: miracond @.> Sent: Friday, April 29, 2022 10:23:43 PM To: lightvector/KataGo @.> Cc: Subscribed @.***> Subject: [EXTERNAL] Re: [lightvector/KataGo] strengthen the exploration of the opening (Issue #629)

EXTERNAL EMAIL: Use caution before replying, clicking links, and opening attachments.

In fact, I'm not interested in a small GO board. I just want to see whether AI can find more interesting moves in the 19 way GO board we usually played. The current AI start feels a little monotonous.

Usually, when training go AI, the number of simulations is only thousands, which may be difficult to go beyond the local best points to find the good start of in-depth calculation. However, by incorporating the opening part of all training into a large uct tree, it is believed that more opening techniques can be found.

— Reply to this email directly, view it on GitHubhttps://github.com/lightvector/KataGo/issues/629#issuecomment-1113894622, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI4SJOWL5NF3JPVIGD5W3DDVHSKS7ANCNFSM5USC3WJA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

lightvector commented 2 years ago

On 19x19, an opening book seems unlikely to be useful. I think you may be neglecting how exponentially large the tree is. If each self-play game only accomplishes the equivalent of a single rollout of UCT, then the search will be too shallow to find anything that is not already found by a normal search - in an entire month, all of the volunteer distributed efforts of all of KataGo together will expand a search tree that is smaller than a tournament machine (i.e. a few A100s) can do in 30 seconds.

Usually, when training go AI, the number of simulations is only thousands, which may be difficult to go beyond the local best points to find the good start of in-depth calculation. However, by incorporating the opening part of all training into a large uct tree, it is believed that more opening techniques can be found.

If that's the case, you should be able to find evidence doing 1M to 100M playout searches of different very early opening positions (e.g. before move 25) revealing more opening techniques, right? Have you found examples of this?

miracond commented 2 years ago

My graphics card is burnt out. As for whether the in-depth search will reveal some new beginnings, I will investigate this problem again after I have a new graphics card, at least until the end of the blockade due to the COVID-19.

Now I want to talk about a few things:

You said that a strong machine can expand more uct tree nodes in 30 seconds. I think it is a little different: each self-play game can not be simply equivalent to one simulation, at least 1600 simulations (I guess this is the simulation times of katago training).
In addition to searching deeply to find a new opening, there is also the problem of learning which openings to play follow-up, even if these openings themselves are not the most promising. Now the training match spent too much time on training the star position opening. There are also some random starts in the opening training, but most of them are illogical starts that don't need to learn, and in turn, they prove the advantage of star layout. I prefer to learn the opening according to the algorithm of uct and the two factors of importance and curiosity. And hope that all kinds of start pos around star (in AI's view, these starts are also suboptimal) can be trained. I also hope that various typical handicap formations can be trained.

michito744 commented 2 years ago

@lightvector

Here it is. 2022-05-01 (1) This is the actual game record of Shin Jinseo vs Park Junghwan (game 3 of the 26th Samsung Cup Final).

Park won by excellent exploration and Shin lost the game, but it is proof that the 21th candidate move, which was ignored by KataGo's reinforcement learning, is very strong.

Incidentally, the Ichiriki vs Iyama example discussed in the Horizon Effect issue is still not properly solved by KataGo.

michito744 commented 2 years ago

Park's 28th move was excellent, and from here the flow tilted to White. 2022-05-01 (5)

Then KataGo knew about this move? No, it did not value this move at all. 2022-05-01 (4) The policy is low weighted, and the actual search is not even first place with tens of thousands of playouts.

lightvector commented 2 years ago

You said that a strong machine can expand more uct tree nodes in 30 seconds. I think it is a little different: each self-play game can not be simply equivalent to one simulation, at least 1600 simulations (I guess this is the simulation times of katago training).

@miracond

Imagine we only played a single self-play game so far. Then our opening book would contain only data about the result of a single possible sequence of moves, right? It would not contain the results of 1600 different sequences of moves, unless you propose to also record a vastly larger amount of data in the opening book. That's what I mean by "equivalent to one simulation" - for the purposes of what counting how many variations are physically stored in the book, it is "one simulation" or "one variation" that is stored in the book per each game.

The reason that counting the number of variations is relevant is that until the number of variations in the book is very, very large, there is no value to the book because you only reach very shallow into the game before the opening book is done. Once you reach beyond the point where the variations overlap, it is now exactly identical to just a normal self-play game with no book. This will happen relatively quickly, if you are trying to make sure the book contains lots of interesting moves, non-star-point moves, etc. Or, you can make it reach deeper by restricting moves, but then the book will not contain so many interesting moves and will be "monotonous" like you said.

@michito744 - Thank you for these additional examples!

I agree that you and @miracond are correctly identifying an interesting and challenging problem to solve. I'm not convinced an opening book that aggregates the results of self-play games will be a good way of solving this, because too many games are required before the book goes deep enough to reach positions like the ones you mention and experimentation with parameters will be incredibly slow and expensive.

It will almost certainly be better to try building a book directly, using similar code as used for https://katagobooks.org/ instead of doing it in distributed training. This has a lot of benefits:

Instead of depending on slow gradual accumulation of self-play games, you can simply build an opening book immediately and directly.
Experimentation will be vastly faster, because if you use cheap search settings, you can experiment with the effect of different parameter settings in only hours/days, instead of months.
Changing the book logic arbitrarily for experimentation only requires a local change to the algorithm. It doesn't require re-deploying the distributed webserver for KataGo training, or updating everyone's training clients.

michito744 commented 2 years ago

@lightvector Thanks.

While not an opening book-specific problem, the current KataGo60b network clearly lacks search depth.

I understand that the resources required will increase rapidly, but I do not believe that the current setup is likely to improve the problem-solving capability in any way.

lightvector commented 2 years ago

Incidentally, the Ichiriki vs Iyama example discussed in the Horizon Effect issue is still not properly solved by KataGo.

I didn't add this position to KataGo's training until only the a few weeks ago, so probably not enough time has passed. I update positions only every few months, you happened to catch me very soon after I updated the positions, rather than before. Now that it is added, there might be some improvement on it in another few months.

lightvector commented 2 years ago

I played with things a bit more and it seems to be not too uncommon that at very high visits (e.g. > 100k) the top move in the opening situation when there are a lot of choices is one that has relatively low policy prior (e.g. 0.8%). So opening book enumeration or similar approaches to try to be strictly thorough are entirely unnecessary, you can just find such moves on any different opening variations, and if you can learn a variety of these the net should hopefully generalize reasonably simply due to the wide variety.

So actually I think the following approach may work - take a large corpus of random pro human and self-play game positions, to get a large mix of variety. Perform 100k+ visits searches on each of them, and in each case where the top move is one that has relatively low prior probability, record that position and that move. Or maybe even the second or third moves, if they get high value and a lot of visits despite not being top move.

And add the entire collection of these back to training as a set of "hintposes", the same way we do blind spot training elsewhere.

michito744 commented 2 years ago

A phase in which the evaluation value drops so low that the winner could be overturned if the 0.1% policy move is not chosen, still appears in practice.

lightvector / KataGo

strengthen the exploration of the opening #629