leela-zero / leela-zero

Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper.
GNU General Public License v3.0
5.36k stars 1.01k forks source link

Using other networks for self-play #1371

Closed marcocalignano closed 5 years ago

marcocalignano commented 6 years ago

Since the use of ELF network seems to yield so good result why don't we use even other networks. For example we could keep training the 20b network from @bjiyxo and when is better than the current, we could also use it for self-play games to push the actual network?. Maybe someone could start training a 40b network.

l1t1 commented 6 years ago

we can not judge whether a network is pure computer self-play

gcp commented 6 years ago

Those networks generally weren't enough stronger that the speed drop makes up for it. If they were, we would have just switched to them. But with ELF there was no question. We didn't switch to that because we don't have code to train it (among other reasons...)

brynne8 commented 6 years ago

I think one of the reasons why we don't switch is that we have trained over such a long time, and just using ELF or other network may waste the efforts, which generate LZ's own style of playing go. 50/50 is a good way to learn strong moves without losing LZ's own style or strengths (presumably).

marcocalignano commented 6 years ago

@gcp But if you look at the charts of @Friday9i in #1113 you can come up with a lower number of visit for these network that would reduce the speed drop. (These graphs also show that if we are around 1600 playouts now the ELF network needs only 315 playouts to be as strong as L131, so you could reduce the visit number to 1800 and still have a better network that deliver a game faster.)

gcp commented 6 years ago

Yeah, it's true that we could run ELF with lower visits. We just added it quickly with a 1-liner on the server side. I didn't think it was clear for the 256x20's? But I did not test them all.

Mardak commented 6 years ago

It's pretty simple to have the server send ELF self-play tasks run with a different amount of visits. Although keeping it at the usual 3200 means we should be improving training quite a bit more than plain ELF.

marcocalignano commented 6 years ago

I disagree, I think would be better to have more games from a less stronger network then little games from a really strong network. BTW this also imply that if we run let's say 20% of our self play games with a higher visits count we would also get stronger game out of our network. Maybe (if we finally use the autogtp statistics patch) we can give this stronger games to the fister clients.

Mardak commented 6 years ago

I was just referring to the quality of training data. Increasing the quantity of training data from a much stronger network could be be beneficial too. Better? Maybe?

For the statistics patch, I seem to recall that the approach there was to have a unique identifier for each client's device. The server can estimate how fast an IP address can complete tasks (different from the current throughput measurement) without persisting tracking ids.

Ishinoshita commented 6 years ago

This links up well with the fact that AZ used 'only' 800 visits (po ?) for self play, but in conjunction with a 20b nn. Seems like using large nn from start is a penalty for bootstrapping but once the nn is knowledgeable, it scales faster in terms of training data quality and "improvement operator" than a smaller one. So moving fast to 20b+ nn for self play makes sense. On the other hand, current LZ strength sky rocketting curve does not urge for any rush change. Contributors might be happy with that trend for a while ;-)

jkiliani commented 6 years ago

Contributors might be happy with that trend for a while ;-)

I'm sure everyone would be happy with this, but I cannot imagine LZ on 192x15 catching up to ELF, which was presumably at its reinforcement learning limit on 224x20. Even if the temperature change turns out to give an actual performance ceiling boost for even games, I'd be surprised to actually overtake ELF without bootstrapping to 256x20 first.

Ishinoshita commented 6 years ago

Storing client IP address raises legal issues (at least in EU, with coming data privacy regulation), but client's IP could be one-way hashed and the hash stored with game. For any purpose (send weights according to client stats, detecting/ tracing broken clients, bad games, etc...).

gcp commented 6 years ago

We currently store the client's IP in the database to be able to recover in case of a spamming/flooding attack, which is a use that GDPR allows and doesn't need prior consent. But yes, you can't rely on the original data being present there longer term. Throttling is short term though so it would work.

marcocalignano commented 6 years ago

In my patch the stats are done on the client and sent every time the client request a new task. In that moment the server does not need to store IP addresses but just use the stats the client provide to choose which task to deliver.

tapsika commented 6 years ago

Those networks generally weren't enough stronger that the speed drop makes up for it.

Not being a stronger player at time parity does not necessarily mean not being a better teacher at time parity! The current promotion wave shows how essential selfplay labeling quality is.

(And AG's search with its low visit counts may even represent a somewhat better labeling system than LZ search with the same visit counts, who knows.)

gcp commented 6 years ago

Not being a stronger player at time parity does not necessarily mean not being a better teacher at time parity! The current promotion wave shows how essential selfplay labeling quality is.

This sounds nice but it does not make for a logically sound argument. ELF is much stronger even at time parity.

jkiliani commented 6 years ago

Not being a stronger player at time parity does not necessarily mean not being a better teacher at time parity! The current promotion wave shows how essential selfplay labeling quality is.

The current promotion wave is due to ELF games and t=1 in unknown proportions. Attributing it the ELF self-play exclusively and deducing from there that increasing visits is a good idea seems like a weak logical conclusion to me, in particular since the change is still so recent.

@gcp Are we aiming to get a particular number of ELF games (e.g. half of a training window) and stop producing more at that point, and just mix the existing ELF games with a shifting window of LZ games?

gcp commented 6 years ago

It seems imprudent to allow ELF games to grow to more than half the window. I'm hoping LZ is pretty close by then. If not I'll probably limit ELF to the latest 250k (when dumping) but allow new to come in to provide more data.

There's a good possibility we can't get to ELF without another size increase though.

jkiliani commented 6 years ago

Do you think we can actually get pretty close to ELF on 192x15? I'd find that surprising, since presumably ELF is at or very near the skill limit of 224x20...

gcp commented 6 years ago

See edit, no, we might stall before. I hope @bjiyxo can keep updating his effort.

tapsika commented 6 years ago

Training on more diverse (ELF+LZ) games could also be an advantage in itself, as is when the games were played by a different net than what is being taught. OC both affect learning speed mostly, but may even push the the limits a bit as well (pure selfplay may not completely reach the theoretical peak of a net structure, esp. with weak search).

Marcin1960 commented 6 years ago

Since ELF helps LZ so much, does it mean that 15x192 capacity is very far from being exhausted and that the real bottle neck is self-play?

bochen2027 commented 6 years ago

@Marcin1960 I thought the same. It seems like ELF at a small 20b is suprassing others at 40b and 60b... (pheonix, fineart, golaxy) So I hope there is much room left in the 15b!

Agreed that eventually will need to go to 20b, but I'm not sure how prudent it is to use a 20b by a third party individual who originally net2net it from a 6block. Maybe its time to do a brand new net2net to 20 b from scratch in the post-ELF era?

jkiliani commented 6 years ago

If ELF could reach this level at 20b, I think it just means that they figured out how to correctly reproduce the Alphago Zero approach, and that apparently, FineArt and PhoenixGo didn't. Just using 40 blocks doesn't give you a strong program, if there's a problem e.g. with the UCT search, training parameters, or something else.

This just means that FineArt still is very far from Alphago Zero 40 blocks... there's a whole lot of improvement potential left in computer Go.

Marcin1960 commented 6 years ago

@hydrogenpi "Maybe its time to do a brand new net2net to 20 b from scratch in the post-ELF era?"

You seem to be very eager to deviate from the present course of LZ project in various ways, as soon as possible.

bochen2027 commented 6 years ago

@Marcin1960 where is the deviation? If the idea is to go to 20 block, and to use net2net to do so, what issue do you have with net2net the ecab 15 block to 20block and go from there rather than versus using a 20b that originally was net2net'd from a (now ancient) 6 block? Where is the logic in that? I don't see how in any way this could be conceivably perceived as a "deviation" to suggest using newer arch to net2net rather than much older net/arch.

bjiyxo commented 6 years ago

I'm not sure whether I should keep going 20x256. One reason is that GCP may not use my 20x256 to replace 15x192. Another reason is that there are some issues (i.e. ladder issues in Golaxy game 6) there and we don't know if it can be fixed by self-play. And last but not least, 20x256 may not be much more stronger than ELF's weights, so training another 20x256 might be useless. So I'm a little bit lost now...

remdu commented 6 years ago

I think the ladder issues are probably fixable by self play at least.

gcp commented 6 years ago

One reason is that GCP may not use my 20x256 to replace 15x192.

I am certainly planning to do that when a) 192x15 runs out, but we don't know when b) it's not a lot worse than 192x15 at that point (this has never happened for your weights, but it happened for some of mine!)

Another reason is that there are some issues (i.e. ladder issues in Golaxy game 6) there and we don't know if it can be fixed by self-play.

I wouldn't read too much into that as we have seen for example that ELF can also exhibit them.

And last but not least, 20x256 may not be much more stronger than ELF's weights, so training another 20x256 might be useless.

That might be true if ELF indeed reached the limit of 224x20, though one could hope we can train a 256x20 that is better in handicap games.

bochen2027 commented 6 years ago

@bjiyxo Your 20b experiment still proved of value even though it wasn't directly used since in my opinion it helped to pursuade the official project to move upwards in size sooner than it otherwise would have, which since 10b in retrospect was also saturated it was a good move. Many people say "time doesn't matter" as if there was all the time in the world but in the real world that isn't the case and its always a factor. Not saying we are in a Go AI arms race, but time of course matters in everything in real life.

Would it be condusive to experiment with a 40b or even a mid size between 20b and 40b?

jkiliani commented 6 years ago

256x20 may not become much stronger than ELF weights, but it should become considerably stronger than LZ 192x15 weights. I do not think the effort to train it wasted at all, on the contrary @bjiyxo has contributed very significantly to this project regarding methods to bootstrap larger networks from self-play games.

About the current 256x20 net, wouldn't it be much more promising to get this network trained as high as possible with the ELF and t=1 self-play games so it can take over from 192x15 when that architecture's ceiling is reached, rather than training another net from scratch? Going higher on residual blocks could still be done later. Also, a Leela Zero 256x20 net at capacity may only be somewhat stronger than ELF, but likely much stronger at giving handicap due to the temperature change.

bjiyxo commented 6 years ago

I'm now training 40x256 instead of 20x256 because ELF may reach the limit of 20blocks. I'm still hesitating about whether I should keep training 20x256. In fact, LZ is growing rapidly now and maybe I should run autogtp instead of training 20x256.

jkiliani commented 6 years ago

Maybe @gcp would be willing to train up your network from the last checkpoint then? Would seem a shame to have this network go to waste after all the effort you put in it...

gcp commented 6 years ago

In fact, LZ is growing rapidly now and maybe I should run autogtp instead of training 20x256.

There are 812 people running autogtp, there are 0 people training up a >15x192 network (that I know of).

As to whether 256x20 or 256x40 is best right now, I do not know.

Marcin1960 commented 6 years ago

@bjiyxo " LZ is growing rapidly now and maybe I should run autogtp instead of training 20x256."

Definitely! 15x192 should be a priority until its potential is REALLY exhausted.

BTW, my selfish reason is that 20x256 is too slow on my hardware. I am not going to buy a new PC in nearest time and if this becomes a requirement I will have to drop out.

Ishinoshita commented 6 years ago

@gcp What's the current proportion of selfplay games generated with ELF weights vs generated with regular 15x192 LZ ?

gcp commented 6 years ago

Should be very close to 50%. Maybe a bit lower for ELF if people didn't update the client.

jkiliani commented 6 years ago

@Marcin1960 Just so you know, I'm also not blessed with powerful hardware here, but I can still run LZ on autogtp even if it takes forever, and Lizzie even if I only have a couple hundred visits per move instead of the thousands that GTX 1080 users get. I'm all for continuing 192x15 until it reaches its limit, I'm just not into squeezing the last Elo out of it if that compute could more productively go into advancing the project further. Deciding on the best architecture is a decision that those who can train such nets should decide between them, but upgrading to something larger once you stall looks like a no-brainer from my perspective.

roy7 commented 6 years ago

An interesting thing @Mardak found when looking at ladders is it seems ELF just avoids ladders totally (very low priors) where as LZ reads the ladders out to the end (very high priors). This also means LZ can play a winning ladder if there's a ladder breaker, but ELF won't.

bjiyxo commented 6 years ago

Then I will keep training both 20x256 and 40x256. So there will be another new 20x256 a few days later.

Cabu commented 6 years ago

I would like to stay on 192x15 as much as possible for the sames reasons as @Marcin1960 . Seeing the sharp rise we have right now, I would like to do an experiment once we have moved to a bigger network for some time: Try to train our good old 192x15 with all the games from the bigger/better network to see if we can squeeze some more elo from it. Actually we could even try that right now on a smaller scale with the 128x10, 128x8, and even 64x5 networks just to test this crazy idea. But I don't have the horsepower and the know how to do so by myself :(

Marcin1960 commented 6 years ago

@Cabu "Actually we could even try that right now on a smaller scale with the 128x10, 128x8, and even 64x5 networks just to test this crazy idea. But I don't have the horsepower and to know how to do so by myself :(

Bingo!

Marcin1960 commented 6 years ago

I would train a few 128x10 nets. The result can clarify many questions or might SURPRISE us !

It would be a very interesting experiment.

kityanhem commented 6 years ago

Training smaller network faster so why don't we train 128x6 nets to test. One training by using self-play games of 15b, one training by using self-play games of ELF, we can find out:

  1. How much room for improvement of 6 blocks (compare with current best 128x6 blocks)?
  2. Can it beat some 128x10 nets (even with high playouts)?
  3. If we training by higher quality kifu, does it still improve? or will reach the limit of 6 blocks (no more room to improve even we training by much more self-play games from the net- more stronger than ELF now)?
  4. If 6 blocks can still improve, it mean 10 blocks or more is the same.

Maybe with this way we can make some nets smaller but stronger.

jkiliani commented 6 years ago

I looked through a couple of new ELF self-play games played by LZ 0.15, and excluding 1-visit moves seems to have made a huge difference. All of the games I saw now look reasonable, and the majority aren't resigned at move 92 anymore either, which seems really promising as well. Maybe we should phase the 0.14 ELF games out of the window eventually once there are enough 0.15 games to fill half the training window?

marcocalignano commented 6 years ago

So I did the experiment: the same network (L135) against itself but one side got double as much visit count. So this was the command lines:

./leelaz -g -v 6401 --noponder -t 1 -q -d -r 0 -w net_doublevisit vs ./leelaz -g -v 3201 --noponder -t 1 -q -d -r 0 -w net_normalvisit

and this was the result:

60 wins, 26 losses
The first net is better than the second
net_doub v net_norm ( 86 games)
              wins        black       white
net_doub   60 69.77%   32 66.67%   28 73.68%
net_norm   26 30.23%   16 33.33%   10 26.32%
                       48 55.81%   38 44.19%

and the calculate ELO difference is 63.

After this result that confirm that the same network could play better self-play game with more visit count
we need to evaluate if it is worth to increase the visit count (just for a little percentage) of the self-play on fast clients or maybe it is not worth the pain.

jkiliani commented 6 years ago

60 - 26 is an Elo difference of 145 according to http://www.3dkingdoms.com/chess/elo.htm, although a result like this would still have a very significant error bar attached to it. Even so, I have significant doubts that it would be worth giving up a factor of 2 in game generation speed, especially when it's already so low. The difference of 1 visit to 3200 visits seems to be in the range of ~1500 Elo (?), and that is the reference strength gap to compare against, since we fit the raw net to approximate the visit distribution of the search output.

Didn't @Ttl estimate the optimal visit counts for maximum strength gain / time from the shape of the visit distribution over total visits a while ago? I remember that this estimate at least suggested that less visits are more efficient...

Hersmunch commented 6 years ago

Yes it was 500 visits, see @Ttl's posts here https://github.com/gcp/leela-zero/issues/1348#issuecomment-386865798 https://github.com/gcp/leela-zero/issues/1030#issuecomment-374246946

Mardak commented 6 years ago

Thanks for running that. I believe the previous estimates for doubling visits for 128x10 (?) was about 200 Elo difference. The primary purpose of self-play with some amount of visits for search is it's a policy improvement operator, so yes, increasing visits would definitely help generate stronger training data at a cost of less games / training data.

But I would think part of the reason for including ELF generated self-play is that it's just more efficient at producing higher quality self-play. Using @Friday9i results in https://github.com/gcp/leela-zero/issues/1113#issuecomment-387311283, the orange dot shows ELF with ~320 visits is as strong as a 192x15 with ~5x more visits (1600 visits). The orange line never goes below 2, so if we estimate 224x20 slowdown compared to 192x15 to be 2x, this means even accounting for size / slowdown, ELF will generate higher quality self-play than just searching with more visits with 192x15.

Edit: Yes, as a long-term plan when ELF isn't as useful as a teacher, increasing visits could help, although one would probably need to rerun the analysis of 6400 vs 3200 visits at that point.

marcocalignano commented 6 years ago

I know that ELF is so much better, but I was exploring a possibility to get stronger self-play games even after we reach the ELF strength. I also wanted to see if the more visit theory was also statistically proven.

Mardak commented 6 years ago

Just doing a quick test of doubling of low visits (100 vs 50) with 192x15 LZ136 and 224x20 ELF:

51 wins, 18 losses
The first net is better than the second
double   v 192x15   ( 77 games)
              wins        black       white
double     55 71.43%   23 76.67%   32 68.09%
192x15     22 28.57%    7 23.33%   15 31.91%
                       30 38.96%   47 61.04%

51 wins, 18 losses
The first net is better than the second
double   v 224x20   ( 69 games)
              wins        black       white
double     51 73.91%   26 72.22%   25 75.76%
224x20     18 26.09%   10 27.78%    8 24.24%
                       36 52.17%   33 47.83%

Those results are +159 Elo and +181 Elo respectively with ±12% margin of error.

But just as a stop-gap alternative to switching away from ELF when LZ reaches ELF levels is to generate ELF self-play with doubled visits too -- similar to marcocalignano's proposal.