leela-zero / leela-zero

Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper.
GNU General Public License v3.0
5.28k stars 1.01k forks source link

40x256 training note. #1681

Open bjiyxo opened 5 years ago

bjiyxo commented 5 years ago

1fdfb1c5 trained to 155(834f35fa) https://www.dropbox.com/s/64lsyjfefrg1zpy/40b_155_328k.gz?dl=0 https://drive.google.com/file/d/1lsL1bLNJ-ck4FAE3wlI1U0mF1V1EjI6U/view?usp=sharing

bjiyxo commented 5 years ago

@gcp @roy7 Can you match 40b in queue?

l1t1 commented 5 years ago

can you add information of training?

bubblesld commented 5 years ago

learning rate 0.0001

current round: v152-v155 + elf released version: 328k. still going

previous round: v148-v152 + elf final version: 384k

I forgot the rounds before these two. But I remembered that bjiyxo posted it in some thread.

ps: 328k vs 224k = 76:61, not tested vs elf0, elf1 224k vs elf0 = 86:83 224k vs elf1 = 53:79 all matches with 3200 visits

Friday9i commented 5 years ago

So released version should probably be in between Elf0 and Elf1, which is quite much stronger than LZ160, so nice! Let's wait for the test results anyway, that'll clarify it's Elo

john45678 commented 5 years ago

Thanks Bjiyxo, can't wait to see this...

Friday9i commented 5 years ago

I'm a bit worried by the LR however: isn't it a bit too small? If yes, we risk being stuck in a local minima and it will be hard to improve it by selfplay (without going down quite significantly, which is not allowed by gating). BTW, isn't it the problem we face with current 20b net?

herazul commented 5 years ago

@Friday9i if it is much better than current 20b, it will generate much better games. That and with new elf1 games, it will maybe be possible to see improvement and pass gating with a higher LR maybe. i dont even know if this learning rate is bad at current LZ state.

bubblesld commented 5 years ago

@Friday9i I tried 0.0005 learning rate for the same training data, and found that the result is poor (worse than the previous weight).

Friday9i commented 5 years ago

Thx @bubblesld, interesting

bubblesld commented 5 years ago

I could be wrong, but I feel that the poor results from offiicial 20b is due to the high learning rate.

herazul commented 5 years ago

I don't remember, what was the learning rate at the end of 15b ? didn't @gcp lowered it at the end ? Maybe LZ has enough step and is at a point where LR must be reduced to 0.0001, whatever the size of the net being trained.

NhanHo commented 5 years ago

@bubblesld Is the 40b net2net from a smaller lz network, or did you bootstrap it from a random net?

bubblesld commented 5 years ago

@NhanHo from bjiyxo's 20b. @bjiyxo did the net2net.

bjiyxo commented 5 years ago

@NhanHo @bubblesld No, I didn't n2n this time. I trained it from Xavier Initialization.

bjiyxo commented 5 years ago

FYI https://github.com/gcp/leela-zero/issues/1554#issuecomment-399776562

NhanHo commented 5 years ago

@bjiyxo How many games were the network trained in total? It seems like my net needs to run through the last 2million games to match the current network strength

bjiyxo commented 5 years ago

@NhanHo

How many games were the network trained in total?

40b was trained from the beginning of 10b.

It seems like my net needs to run through the last 2million games to match the current network strength.

Maybe you can post your training schedule, including your window, lr, bs and data.

herazul commented 5 years ago

What is this Xavier Initialization ? It was a clean 40b blank net ?

bjiyxo commented 5 years ago

You can regard Xavier Initialization as clean 40b blank net.

herazul commented 5 years ago

okay thanks for all the informations !

roy7 commented 5 years ago

Test matches queued.

Friday9i commented 5 years ago

Promising start against Elf1, let's cross fingers!

bubblesld commented 5 years ago

I would not expect 40b to be close to elf1.

Friday9i commented 5 years ago

From your comment above 328 vs 224 vs Elf1 that should be quite close...

bubblesld commented 5 years ago

@Friday9i It is not. From my experience, the best weight in each round is much better than that of the previous round. But the winning rate against elf0 increases very slowly (even worse sometimes).

MartinDevelopment commented 5 years ago

If the 40 block network comes our really strong, why don't we go all out and use it? Mabye even lower the visit amount to 800 and see what happens, I think it would be very interesting to see.

gcp commented 5 years ago

If the 40 block network comes our really strong, why don't we go all out and use it

We could. It's "only" half the speed as 256x20. We would skip an optimal 256x20 network then, but we have ELFv1 too, so there's options in that range.

2ji3150 commented 5 years ago

Looks like the Elf v1 is still much stronger. Sorry it may off-topic here. May I ask that doest it make sense to produce self-play game between two network? (elf v1 vs current best) I am currious that it can know how to win it self from elf v1.

roy7 commented 5 years ago

Note the file size on 40 block is huge though. Wonder if we eventually need a torrent based solution for that.

MartinDevelopment commented 5 years ago

@roy7 I don't think a torrent based solution would work since some people might not have access to that since it is blocked for them.

jkiliani commented 5 years ago

Since this net has also been trained with very low learning rates, it is likely that promoting it would encounter the same initial problem as we're having right now: Reinforcement learning with the usual learning rates would likely fail to produce networks which pass gating.

Marcin1960 commented 5 years ago

@MartinDevelopment "I don't think a torrent based solution would work since some people might not have access to that since it is blocked for them"

A big file can be split into parts and joined with a very simple utility.

herazul commented 5 years ago

Well, seems like the 40b net is way too good to not use.

bubblesld commented 5 years ago

@2ji3150 I guess that current match is with 1600v. elf has advantage in low visit due to its sharper values in policy. I believe that 40b can produce at least 1/3 w.r. against elf1 for higher visit.

alphaladder commented 5 years ago

@gcp 40b self play games are very useful to produce any lower block weight, and whatever you want. We do not care half speed for 40b training and current 20b is hopeless and just a aimless lost in elf trap

Most guys in our training group (500 persons] also wish to keep going on 40b and it’s very attractive and we will enjoy this great promotion.

gcp commented 5 years ago

Reinforcement learning with the usual learning rates would likely fail to produce networks which pass gating.

We can drop learning rate then. I don't see any other options to progress. The next run for the current 256x20 will be at bs=128 @ 0.0001. (By the way, if you mention learning rate, the number is pointless if you don't include the batch size!)

I didn't manage to train a net competitive with the best 192x15 with larger rates.

current 20b is hopeless

Well, by this kind of reasoning, we don't need to try the 256x40 because it'll be more of the same. (Luckily I don't agree with you)

Note the file size on 40 block is huge though. Wonder if we eventually need a torrent based solution for that.

Data traffic from my webserver was 3.5TB in July. If we go above 5TB, it will be throttled (but not dramatically). I suspect we'll be fine. Most of the traffic seems to come in huge spikes, so that won't be the clients who are downloading new networks day to day, but rather people who are downloading all networks or something. I hope there won't be people running the clients that suddenly get a scare when they see their data traffic stats.

I'd like to see if we can get any movement on the 256x20 before we go to 256x40 though. Let's give it one more cycle at the lower rate at least.

1-punchMan commented 5 years ago

20b progressed.

tux3 commented 5 years ago

Data traffic from my webserver was 3.5TB in July. If we go above 5TB, it will be throttled (but not dramatically).

If you do end up needing some bandwidth, I'd be happy to provide a server with 60TB/month.

wonderingabout commented 5 years ago

i think experimenting with 20 blocks is fine and above all interesting after all, elf v1 is 20 blocks but still much stronger than this lz 40 blocks this just shows how much potential the 20 block has

i think its really interesting to keep waiting for 100-150k games on the 20 block lz to see how it influences new networks strength and direction of play

20b is not stalled or hopeless, its just its in early stage, i think there seems to be a trend on the 20b, progressively getting stronger and learning things (fixing its own holes ?) i'm curious to see how things will develop for the 20b network, remembering that elf v1 is 20 blocks but so much stronger

wonderingabout commented 5 years ago

i think theres no need to upgrade immediately to 40b

40 blocks may be be stronger at the start, but like elf v0 and v1 showed, 20 blocks has the potential to be stronger than current 40 blocks lz so as long as 20b is not stalling, it may be more efficient to use it as a training point (starting point is lower, but we can expect it to learn faster than 40b if we look at elo gained per week, with equal computing power between 20b and 40b) also, this will require less ressource to train and less ressource to play matches (which can be useful to play against it on weak computers, smartphones, etc...)

and finally, like i said, i'm curious to see how this 20b experiment will develop, going directly to 40b makes us miss a chance to learn knowledge and experience about bots if we made some mistake, its a good opportunity to learn from it, no need to rush things

you should also consider that by the time lz 40 blocks catches up to elf v1, elf team can release a 40b elf v2 which will be insanely stronger than the 40b you want to achieve, so again rushing for strength is pointless imo

jkiliani commented 5 years ago

I think we have to leave this decision to @gcp, there are certainly arguments both for and against going to 40 blocks soon. If a majority of self-play contributors favor upgrading sooner rather than later, we might as well upgrade early this time, but there's no harm in at least waiting out another few training cycles while trying different learning rates.

iteachcs commented 5 years ago

I am no expert on this. I am very interested in seeing how high leelaz can reach with a 40-block network. Also, it is unlikely that we will see leelaz surpass ELFv1 in its strength anytime soon. So the fans' enthusiasm might be dampened at a certain point if we keep training a 20-block network that is far below the strength of ELF.

The only worry I have is that it appears we do not know exactly why we are having problems with the current 20-block networks. It could be the learning rate, the ratio of the ELF games being used, the use of 1601 visits, or something else. I would think that we want to at least have a better idea of the problem before moving onto 40 blocks since it will take double the amount of time to experiment to find the problem if we are stuck again at the 40-block level.

Perhaps, all what we are seeing is the nature of the network when it reaches a certain level of strength. I suspect that if we are stuck in 40 blocks again in the future, we will have to try the Alpha Zero approach of no gating to move a network out of a local minima.

jkiliani commented 5 years ago

Since we're considering going to 40 blocks soon, one possible change that would very likely help in either case would be to up the ELF fraction in game production, from 25% to half or even higher. Regular self-play games at 1600 visits are currently somewhat weaker than 15 block games at 3200 visits, but ELF v1 games are definitely more useful than either. Having more of them sooner may help the 20 block nets to pass gating, and would also be more useful in case we switch to 40 blocks.

TFiFiE commented 5 years ago

It's hard to believe the current lack of improvement can largely be attributed to the training process hitting a plateau at the current network size, considering the smaller ELF networks that are significantly stronger than not only the current best 20x256, but also the bootstrapped 40x256 one. The training process is meant to maximize improvement per time unit (which was the motivation behind decreasing the number of visits). In the absence of a legitimate plateau, it simply makes no sense to have a larger and thus slower network generate training games. That has always been the purpose of the current "official" network and the very fact a larger, stronger network could be created from the training games by a separate endeavor in the first place means that no one should somehow feel deprived of anything.

I think the network size of the training process should not be increased until there are no more 192x15 games in the training window, and maybe even that then there should be a full pass through all the typical learning rates (to have a chance of getting out of local maxima), just to make sure this isn't a transitional issue and 20x256 is indeed irredeemable beyond a reasonable doubt. Don't be impatient and worry about a few weeks.

herazul commented 5 years ago

@TFiFiE What you don't consider is that if a net is WAY better (even if it's a bigger net) than a smaller net, it's better to generate the selfplay game with it, because the net you will train on these way better self play games will be a lot better (even if you want to train a smaller net with these games).

If the 40b net only had something like 60 or 65% WR it would be debattable what's better for the fastest training possible : the 40b net and it's 60% wr ? or two times faster 20b net with slightly worse game quality ? But with an insane 40b net with 85% WR, there is not even a contest.

Doesn't mean it's not interesting to continue with the 20b net to see what can happen and what's wrong with it's training for now and test things and parameters, but for efficiency it's 100% the 40b net.

MacErlang commented 5 years ago

@herazul Recall that AlphaZero was trained from scratch. If human games were used, the training would hit a lower "ceiling". In other words, an AlphaZero trained by human games would have a faster start, but the final result might be a weaker network, as it might miss some "blind spots".

bubblesld commented 5 years ago

@MacErlang The concern was that the weight is contaminated by the weakness of the human games. Then what is your point comparing to skip 20b-net? In fact, the 40b weight is already contaminated by the weakness of the lower block games since it is trained from those self-games.

diadorak commented 5 years ago

What's the evidence that human games resulted in a lower ceiling?

MacErlang commented 5 years ago

@bubblesld You have a good point. I am no expert on this. But, perhaps one implication is that the 40B network (or any new blocks) should be trained from scratch, despite the poor performance at the outset. Does this make any sense at all? Since potential moves are generated randomly, a clean slate might reduce possible sampling bias created by weaker games from a lower block count.

l1t1 commented 5 years ago

It is necessary to conduct experiments and data analysis to find technical details not mentioned in the paper.