Closed zediir closed 5 years ago
so we now have bron many robots from leelaz. nice job.
Is this leelaz an FPU-tuned one?
Must be FPU-tuned, or @zediir would not have mentioned
Leelaz binary used was built from current next-branch.
Did you train this from self-play data completely? Looks very promising!
This is supervised, not self-play
Supervised. From the TYGEM dataset from here https://github.com/yenw/computer-go-dataset
There was someone who had an updated dataset, I believe they pulled many many games from a server. I can't find the dataset but it had 9Ds playing. It would be good if someone could find it, and provide a link to it to zediir to see if this other dataset would produce a stronger LeelaZero. Edit: I have tried searching for it but couldn't find it. A person had an automated script to download nearly the entire history of one of the online Go servers for high ranked games, I remember they had uploaded the data, but I am not sure where that original post is.
@jillybob that's my dataset. im probably going to continue to update it with all current Tygem 9d games. currently it has fewer 9d games than the Tygem dataset.
@zediir How many steps and what batch size did you train this network for? What was your final learning rate and momentum? And what were your Tensorboard Accuracy, MSE error, and Policy error? I'm extremely interested in comparing with your results.
Did you weight the value head training (mse error) by 0.01 as recommended in the AGZ paper?
3,008,000 steps with batch size 64. Accuracy was 59%. For the rest I don't remember and I'm done with this for now. Going to be gaming again after having my gpu busy for a month.
Is this the strongest network we think we have?
I would put it up as a reference on the graph, perhaps?
Good idea, would you set up a match for it against a current net?
The one-but-newest one, yes. (Else it could promote!)
actually, why not 99e274c5?
Even better, yes.
Hmm, even gzipped it's a 92M download. I can't actually upload it without changing the server config.
Maybe someone could just run the match and then add the result manually to the database. Unfortunately my gpu usage is spoken for for a couple of more weeks before I can contribute again.
I actually killed the server trying to upload it. node unzips a copy in memory (or even more than 1 internally perhaps) and the machine is swapping.
It should recover in a few minutes. I hope.
Feeling pretty stupid right now. Lesson learned: we're not quite ready for a 256x20 run.
Shit happens. What is the actual capacity limit of files in the server anyway? It was still able to handle the 128x10 nets after all...
At least you didn't write chown www-data:www-data -R /var
when you meant to write chown -R www-data:www-data var/
. (that might have happened to someone today :D)
Not sure what the limit on the server is. It's running mongodb and node, which are the big memory eaters (and mongodump for backups). The file is "only" 270M but I suspect it balloons up a lot inside node as that isn't exactly known for being very memory efficient.
zero-test.sjeng.org is a backup server, right? If it has the same hardware characteristics, could it be used to safely experiment on this?
If I'm reading this right the only reason the server unzips the file is to calculate the hash for it? https://github.com/gcp/leela-zero-server/blob/754cc8ab8ac94f8034cbd9e10a14cea9e6b49b26/server.js#L524
I think zero-test.sjeng.org runs on the same server just in a separate process.
leelaz-model-3008000 is very strong. On my computer it won two even games with AQ , 12 sec per move. However, when I tried to run another game giving AQ two handicaps, LZ fell off the ladder.
This 20 block net is pure supervised learning from professional games. I think it's highly likely that a RL 20 block net will do fine with large groups in general.
I'm sure even 20 blocks will still fail with very large groups sometimes. I think this is what pushed deepmind to go for 40 blocks.
It may fail sometimes, but I'm pretty sure it will fail far less than the SL network. I think the Leela Zero net trained by @zediir may well be comparable to the Deepmind supervised learning reference net they show in the Alphago Zero paper in Fig. 3, which ended up around 200 Elo below Alphago Lee. The reinforcement learning net on (presumably) the same network architecture 256x20 was ~800 Elo stronger than this SL net.
@jkiliani In this sense a 256x20 network trained by LeelaZero existing self-play games may be overall more balanced in strength and made less mistakes?
self play should be able to work out holes in its play right now
@billyswong I honestly have no idea. In theory, the quality of play of the professional game dataset ist much higher than current LZ self-play games, but network training can work in mysterious ways. Someone might try this, but training such a network is a long investment...
I do hope we end up testing this net on the server at some point, as another reference point better than best_v1.
There are two problems with training on high rank human games.
I'm sure even 20 blocks will still fail with very large groups sometimes. I think this is what pushed deepmind to go for 40 blocks.
For 20 blocks I don't think a large group is so different than a smaller one, 40 blocks was probably tried simply because DM needed to surpass Master for marketing reasons (they are sensitive to such cosmetics, up to the levels of manipulating their results a bit, like in the AZ vs. AGZ case).
There is already evidence that even 20 blocks has issues with large groups here #708. Granted this is a supervised network and not a reinforced one but still, there is evidence that even 20 blocks might have problems, no evidence that it will be perfect. I agree that the marketing effect of surpassing master played a part in the decision though.
Since 20 blocks should have no noticeable range limit, policy errors are likely caused by other things. In those cases, it's not clear if 40 blocks would do better with the same training data and method. It allows for more complex computations so has a general advantage, but in a particular error case better / more training may have more effect.
Or maybe - just a wild idea - some things are more useful when can travel distances along the group in a back-and-forth manner (so 20 blocks can fall just a few layers short sometimes)? Sounds unlikely though, at least without filter starvation.
zero-test.sjeng.org is a backup server, right?
It's the same physical machine.
So is this model (30080) stronger than original Leela 11 on a single GTX 1080/Ti?
closing old issue
Here is the 256x20 network I trained. https://drive.google.com/open?id=1PSyfsDvXmtIrhUx4Gf6x7mGzDtwe4npr
I ran it on CGOS with the following settings
The hardware it ran on is an i7 7700k and two GTX 1080 graphics cards.
The bayes elos were: LZH256x20-t4-p1600: 3535 LZH256x20-t4-p3200: 3586 LZH256x20-t4-nolim: 3659
I also ran it against Leela 0.11 with equal time and hardware on kgs
The result currently is: TygemLeela: 27 LeeaEleven: 9
Gets caught in ladders sometimes and sometimes has problems with life and death.
Leelaz binary used was built from current next-branch.
https://www.gokgs.com/gameArchives.jsp?user=LeeaEleven
Code used to get wins numbers from kgs
```javascript var nodes = window.document.getElementsByClassName('grid')[0].childNodes var results = {} for(childNodeIndex in nodes) { var tds = nodes[childNodeIndex].childNodes; if(nodes[childNodeIndex].localName === "tr" && tds[1].localName === "td") { var white = tds[1].innerText var black = tds[2].innerText if(results[black] === undefined) { results[black] = 0 } if(results[white] === undefined) { results[white] = 0 } var result = tds[6].innerText[0] if(result == "B") { results[black]++ } if(result == "W") { results[white]++ } } } for(resultIndex in results) { console.log(resultIndex.substr(0,resultIndex.indexOf("[") - 1) + ': ' + results[resultIndex]) } ```