leela-zero / leela-zero

Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper.
GNU General Public License v3.0
5.35k stars 1.01k forks source link

256x20 network results #884

Closed zediir closed 5 years ago

zediir commented 6 years ago

Here is the 256x20 network I trained. https://drive.google.com/open?id=1PSyfsDvXmtIrhUx4Gf6x7mGzDtwe4npr

I ran it on CGOS with the following settings

%section player
     name      LZH256x20-t4-p1600
     invoke    leelaz.exe -w leelaz-model-3008000.txt -r1 --noponder -p 1600 -t4 -g --gpu 0 --gpu 1

%section player
     name      LZH256x20-t4-p3200
     invoke    leelaz.exe -w leelaz-model-3008000.txt -r1 --noponder -p 3200 -t4 -g --gpu 0 --gpu 1

%section player
     name      LZH256x20-t4-nolim
     invoke    leelaz.exe -w leelaz-model-3008000.txt -r1 -t4 -g --gpu 0 --gpu 1

The hardware it ran on is an i7 7700k and two GTX 1080 graphics cards.

The bayes elos were: LZH256x20-t4-p1600: 3535 LZH256x20-t4-p3200: 3586 LZH256x20-t4-nolim: 3659

I also ran it against Leela 0.11 with equal time and hardware on kgs

engine=leelaz.exe -w leelaz-model-3008000.txt -r10 -t2 -g --gpu 0 --noponder name=TygemLeela rules.time=5:00+10x0:15

engine=Leela0110GTP_OpenCL.exe -t4 -g --noponder name=LeeaEleven

The result currently is: TygemLeela: 27 LeeaEleven: 9

Gets caught in ladders sometimes and sometimes has problems with life and death.

Leelaz binary used was built from current next-branch.

https://www.gokgs.com/gameArchives.jsp?user=LeeaEleven

Code used to get wins numbers from kgs ```javascript var nodes = window.document.getElementsByClassName('grid')[0].childNodes var results = {} for(childNodeIndex in nodes) { var tds = nodes[childNodeIndex].childNodes; if(nodes[childNodeIndex].localName === "tr" && tds[1].localName === "td") { var white = tds[1].innerText var black = tds[2].innerText if(results[black] === undefined) { results[black] = 0 } if(results[white] === undefined) { results[white] = 0 } var result = tds[6].innerText[0] if(result == "B") { results[black]++ } if(result == "W") { results[white]++ } } } for(resultIndex in results) { console.log(resultIndex.substr(0,resultIndex.indexOf("[") - 1) + ': ' + results[resultIndex]) } ```
l1t1 commented 6 years ago

so we now have bron many robots from leelaz. nice job.

RavnaBergsndot commented 6 years ago

Is this leelaz an FPU-tuned one?

jkiliani commented 6 years ago

Must be FPU-tuned, or @zediir would not have mentioned

Leelaz binary used was built from current next-branch.

tterava commented 6 years ago

Did you train this from self-play data completely? Looks very promising!

ghost commented 6 years ago

This is supervised, not self-play

zediir commented 6 years ago

Supervised. From the TYGEM dataset from here https://github.com/yenw/computer-go-dataset

jillybob commented 6 years ago

There was someone who had an updated dataset, I believe they pulled many many games from a server. I can't find the dataset but it had 9Ds playing. It would be good if someone could find it, and provide a link to it to zediir to see if this other dataset would produce a stronger LeelaZero. Edit: I have tried searching for it but couldn't find it. A person had an automated script to download nearly the entire history of one of the online Go servers for high ranked games, I remember they had uploaded the data, but I am not sure where that original post is.

featurecat commented 6 years ago

@jillybob that's my dataset. im probably going to continue to update it with all current Tygem 9d games. currently it has fewer 9d games than the Tygem dataset.

FFLaguna commented 6 years ago

@zediir How many steps and what batch size did you train this network for? What was your final learning rate and momentum? And what were your Tensorboard Accuracy, MSE error, and Policy error? I'm extremely interested in comparing with your results.

Did you weight the value head training (mse error) by 0.01 as recommended in the AGZ paper?

zediir commented 6 years ago

3,008,000 steps with batch size 64. Accuracy was 59%. For the rest I don't remember and I'm done with this for now. Going to be gaming again after having my gpu busy for a month.

gcp commented 6 years ago

Is this the strongest network we think we have?

I would put it up as a reference on the graph, perhaps?

jkiliani commented 6 years ago

Good idea, would you set up a match for it against a current net?

gcp commented 6 years ago

The one-but-newest one, yes. (Else it could promote!)

jkiliani commented 6 years ago

actually, why not 99e274c5?

gcp commented 6 years ago

Even better, yes.

gcp commented 6 years ago

Hmm, even gzipped it's a 92M download. I can't actually upload it without changing the server config.

zediir commented 6 years ago

Maybe someone could just run the match and then add the result manually to the database. Unfortunately my gpu usage is spoken for for a couple of more weeks before I can contribute again.

gcp commented 6 years ago

I actually killed the server trying to upload it. node unzips a copy in memory (or even more than 1 internally perhaps) and the machine is swapping.

It should recover in a few minutes. I hope.

gcp commented 6 years ago

Feeling pretty stupid right now. Lesson learned: we're not quite ready for a 256x20 run.

jkiliani commented 6 years ago

Shit happens. What is the actual capacity limit of files in the server anyway? It was still able to handle the 128x10 nets after all...

zediir commented 6 years ago

At least you didn't write chown www-data:www-data -R /var when you meant to write chown -R www-data:www-data var/ . (that might have happened to someone today :D)

gcp commented 6 years ago

Not sure what the limit on the server is. It's running mongodb and node, which are the big memory eaters (and mongodump for backups). The file is "only" 270M but I suspect it balloons up a lot inside node as that isn't exactly known for being very memory efficient.

jkiliani commented 6 years ago

zero-test.sjeng.org is a backup server, right? If it has the same hardware characteristics, could it be used to safely experiment on this?

zediir commented 6 years ago

If I'm reading this right the only reason the server unzips the file is to calculate the hash for it? https://github.com/gcp/leela-zero-server/blob/754cc8ab8ac94f8034cbd9e10a14cea9e6b49b26/server.js#L524

I think zero-test.sjeng.org runs on the same server just in a separate process.

Splee99 commented 6 years ago

leelaz-model-3008000 is very strong. On my computer it won two even games with AQ , 12 sec per move. However, when I tried to run another game giving AQ two handicaps, LZ fell off the ladder.

Mezoka commented 6 years ago

20 blocks still cannot handle large group properly, it gives over 90% win rate for black at the end of these games : Game 1 Game 2

jkiliani commented 6 years ago

This 20 block net is pure supervised learning from professional games. I think it's highly likely that a RL 20 block net will do fine with large groups in general.

remdu commented 6 years ago

I'm sure even 20 blocks will still fail with very large groups sometimes. I think this is what pushed deepmind to go for 40 blocks.

jkiliani commented 6 years ago

It may fail sometimes, but I'm pretty sure it will fail far less than the SL network. I think the Leela Zero net trained by @zediir may well be comparable to the Deepmind supervised learning reference net they show in the Alphago Zero paper in Fig. 3, which ended up around 200 Elo below Alphago Lee. The reinforcement learning net on (presumably) the same network architecture 256x20 was ~800 Elo stronger than this SL net.

billyswong commented 6 years ago

@jkiliani In this sense a 256x20 network trained by LeelaZero existing self-play games may be overall more balanced in strength and made less mistakes?

featurecat commented 6 years ago

self play should be able to work out holes in its play right now

jkiliani commented 6 years ago

@billyswong I honestly have no idea. In theory, the quality of play of the professional game dataset ist much higher than current LZ self-play games, but network training can work in mysterious ways. Someone might try this, but training such a network is a long investment...

I do hope we end up testing this net on the server at some point, as another reference point better than best_v1.

zediir commented 6 years ago

There are two problems with training on high rank human games.

  1. A lot of the games end in resignation so less endgame
  2. No examples of bad moves to see why not to play them
tapsika commented 6 years ago

I'm sure even 20 blocks will still fail with very large groups sometimes. I think this is what pushed deepmind to go for 40 blocks.

For 20 blocks I don't think a large group is so different than a smaller one, 40 blocks was probably tried simply because DM needed to surpass Master for marketing reasons (they are sensitive to such cosmetics, up to the levels of manipulating their results a bit, like in the AZ vs. AGZ case).

remdu commented 6 years ago

There is already evidence that even 20 blocks has issues with large groups here #708. Granted this is a supervised network and not a reinforced one but still, there is evidence that even 20 blocks might have problems, no evidence that it will be perfect. I agree that the marketing effect of surpassing master played a part in the decision though.

tapsika commented 6 years ago

Since 20 blocks should have no noticeable range limit, policy errors are likely caused by other things. In those cases, it's not clear if 40 blocks would do better with the same training data and method. It allows for more complex computations so has a general advantage, but in a particular error case better / more training may have more effect.

Or maybe - just a wild idea - some things are more useful when can travel distances along the group in a back-and-forth manner (so 20 blocks can fall just a few layers short sometimes)? Sounds unlikely though, at least without filter starvation.

gcp commented 6 years ago

zero-test.sjeng.org is a backup server, right?

It's the same physical machine.

bochen2027 commented 6 years ago

So is this model (30080) stronger than original Leela 11 on a single GTX 1080/Ti?

sethtroisi commented 5 years ago

closing old issue