Time controls and pondering

tterava commented 6 years ago

With large networks like @bjiyxo 's 256x20 it's very important to get as many playouts as possible. The situational awareness and understanding of the network is extremely good, but low playout counts make it struggle even against lot weaker networks in time limited games. It simply gets outread and doesn't see far enough to dodge dead-ends. LZ by default plays very fast and slowing down the time controls seems to increase the strength significantly with larger networks.

I ran a test with network e9c2c70b to get a baseline, and it got 3550 CGOS rating (3717 Bayes) with a single GTX 1080. I ran the test with current Master branch. I then ran bjiyxo's 256x20 V4 which got 3629 CGOS rating (3790 Bayes). Not a big difference, but still pretty good.

I always keep pondering enabled for maximum strength, but I had noticed some issues where trees weren't reused correctly and it turned out to be related to pondering. This had a big impact on time management, as previous calculations were considered as new, and time was wasted on moves that were inevitable. After fixing the issue I decided to rerun the 256x20 V4 experiment and I quickly realized that it was going nowhere: After 30 games the CGOS rating was ~3400 and it couldn't handle any of the opponents it previously could. I realized most of the time it had lost a game while it had more than half of its timebank still left because it would now respond almost instantly to "forced" moves and didn't take time to calculate when it needed to.

I decided to slow down the pace by changing the esimated moves number, which affects time control: m_moves_expected = (boardsize * boardsize) / 5; I changed the divider to 11 (felt like a nice number) and also switched to bjiyxo's 256x20 V5 (which is a little bit stronger, but not by a big margin). To compensate for fast games, i reduced the window required for playout speed estimate, so it would move even faster when LZ already knows what move it will play:

if (elapsed_centis < 100 || playouts < 100) {
        return playouts_left;
    }

I reduced elapsed_centis to 50.

All the changes seemed to have resulted in quite a big jump in playing strength and LZ now moves very fast when it knows what's going on and stops to think when opponent does something unexpected. The run isn't finished, but I think it's safe to say these changes (especially the reduction in expected moves) are very benefial in fixed time games. I haven't noticed any ill effects even on shorter games, as the engine is still very fast when it needs to be.

capture

gcp commented 6 years ago

LZ by default plays very fast and slowing down the time controls seems to increase the strength significantly with larger networks.

FYI, I observed a while ago that because of the time management code, Leela now plays much faster than originally intended. I tried lowering m_moves_expected = (boardsize * boardsize) / 5; to 8 but in testing (which takes ages) it was not a significant improvement, though it was ahead when I stopped it.

gcp commented 6 years ago

All the changes seemed to have resulted in quite a big jump in playing strength

You need to do much more testing to be able to say this, the Elo error margins on your test are still massive compared to the alleged difference.

tterava commented 6 years ago

I'll finish the CGOS run to increase the accuracy. I know it still leaves uncertainty with the results, but if the final ELO is anywhere near the current one, then I think it's quite unlikely that the network upgrade and variance alone would produce such good results.

gcp commented 6 years ago

You can simply see the error margins on the Elo ratings. BayesElo on CGOS will show them after the run finishes.

Marcin1960 commented 6 years ago

Number of selfplay games is dropping down 17146 in past 24 hours, 393 in past hour.

What part of of client base is the 256x20 eating up? Number of clients per hour is dropping somehow too.

I would put 256x20 on back burner for a while and give 192x15 a chance to grow.

bochen2027 commented 6 years ago

It lost a game to minigo due to time?! that lose along dropped it down 40 elos... maybe server lost connectivity somehow? or is there an issue with the code changes...

tterava commented 6 years ago

@hydrogenpi Yeah i actually saw the game. On client side it looked like the opponent stopped moving and LZ hit its visit limit (which won't ever be hit under normal circumstances). Then after quite a while opponent made a move and magically LZ had less than a minute of time, which wasn't enough to finish the game.

bochen2027 commented 6 years ago

This run does seems a lot stronger right off the bat, for example I've never seen LZ hit top 4 before, no matter how fleeting and temporary it is, so it doesn't just get there by random or luck.

Do you know for the matches happening on April 14 are they going to use the v5-256.20 network ? if so maybe the guys running it should make the same code changes you did.

tterava commented 6 years ago

There is still a chance that the previous run was worse than average and this is above average, and network change covers most of the difference. Let's see how the rest of the run goes.

The increase in thinking time is most significant if your network is so large that your system has hard time hitting even 1-2k playouts, like GTX 1080 with 256x20 net. With stronger hardware the difference is greatly diminished and I doubt it would make much change (or any at all) with 4x Titan V which apparently will be used on April 14th. Using divider such as 8 might be a more balanced option because divider 11 doesn't leave much time for late game. The end game is typically played at a very rapid pace with this setting.

I also think there's a very good chance that the 192x15 network will surpass the 256x20 network even with equal playouts. It only takes a few jumps to reach that level.

bochen2027 commented 6 years ago

@tterava feel free to contact me directly if you'd like to do a run with the v5-256 on 4xtitanv I can provide you access for a 100 game run.

if whatever 256 continues to be trained on games that 192 puts out, it remains to be seen if 192 could ever be better than 256 on equal playouts. (I don't think it can) but assuming there is not a contunining effort to train the 256 then yes that is likely to happen. I myself am trying to training a 40b 256f from starting with the first 10block games and contiuning to present day and onwards.

unless there was a typo, it seems golaxy is using 10 x gtx 1080ti for games against LZ, if that is the case then at least LZ should be allowed to use 8 x titanv or 8 x V100 at that range a 40b 256f net would be most ideal . Too bad there isn't multigpu code for training yet otherwise the process should be cut significantly

tterava commented 6 years ago

@hydrogenpi Sure we could set it up. Also the changes I've made aren't that significant so I could just share the binary if you want.

The reason why I think 192 will catch up is because it currently has all the distributed horsepower behind it and gets constantly new candidates. It's hard to compete with that as an individual, even if you train a larger network. Basically the 192 will get a lot more refined faster than the 256. Usually the upgrades come quite fast right after new network upgrade so I would expect quite a few new networks over the next few days.

bochen2027 commented 6 years ago

@tterava okay cool, I don't know how to message you directly so I left my contact in your solarsystemsim project as an 'issue'

One point to note is that since its not likely the LZ project will officially ever move beyond 20b, then if a 40b 256f was trained on 15b/20b games and tracked it in parallel, and if it was stronger on the best hardware, even though it will never be used to generate selfplay games it still has value in that for high end matches there is this largest size variant giving it the best possible edge.

Marcin1960 commented 6 years ago

@hydrogenpi "One point to note is that since its not likely the LZ project will officially ever move beyond 20b, then if a 40b 256f was trained"

40x256 certainly would kill LZ project. Already 20x256 is making clients to leave. selfplay games: 18378 in past 24 hours, 488 in past hour

15x192 will not get new better self-play games to train on.

Is it what you want? It is like shooting a bird in the flight.

clients 137 in past hour.

bochen2027 commented 6 years ago

@Marcin1960 There is no direct correlation between the upping of network size and that of the drop in client participation. Officially, 20x256 was never the network and only today (less than 24 hours) have there been a switch from the 10 block to a 15 block. More likely than not the lack of progress on the 10 block in terms of no newer promotion of nets for almost a week caused the drop in clients, and not the adoption of the 15 block which only actually happened this morning and was in fact accompanied by a good visceral uptick in ELO.

I think you misunderstood what I wrote when I stated that "then if a 40b 256f was trained on 15b/20b games and tracked it in parallel" I did not mean to say reinforcement learning but rather as in being trained in the supervised manner with pre-existing self played games generated by the 15b/20b. This sort of training can be done by one machine or any one individual, so I fail to see how it would be likely to "kill" the LZ project or how it can be compared with "shooting a bird in the flight"

Additionally, at this point what are the alternatives against moving to a larger sized network? It is clear beyond any reasonable doubt that if the project was to stay at 10block size and not move larger, then attrition would be just as great if not even greater. The moving up to larger size was a necessitiy anyhow, the only point of contention was if it should have been 192.15 or 256.20, and for what its worth, I concur that the 192.15 was the prudent move for now.

My suggestion of a 40b 256f side trained in supervised manner using existing LZ training data would in no way impact the progress of the project, unless you think that if a much stronger net existed so people will stop contributing cycles, but if that was the line of thinking, I don't agree that holding back and capping the strength intentionally or encouraging others to do the same is the right route to go for Go.

Hersmunch commented 6 years ago

I suspect the drop in number of clients is because the minimum required version has been updated to 15 / 0.13 https://github.com/gcp/leela-zero-server/commit/a9803399e4953486aaf33996ee1e544736c88ce2

Marcin1960 commented 6 years ago

@hydrogenpi "Additionally, at this point what are the alternatives against moving to a larger sized network?"

Yeah, 40x256, great idea. ;)

fell111 commented 6 years ago

There were so many matches after upgrading to 15x192. We have 396 self play games in past hour. And in the same time period, we have 269 matches. 6718199 total selfplay games. (19020 in past 24 hours, 396 in past hour.) 297353 total match games. (4540 match games in past 24 hours, 269 in past hour.)

We'll get rid of these extra matches in one day or two. Then we can get the accurate self play amount.

bochen2027 commented 6 years ago

It seems pretty ominious trend that clients are still dropping, now averaging just 120 barely

sFaurite commented 6 years ago

I don't think they are leaving, it is just that it takes more time to finish a game. Look at the 24 hours stat : it is stable over 500. Some client takes more than one hour to finish a game, so they are no longer in the last hour stat.

tterava commented 6 years ago

It would be very weird if clients stopped now that we are making good progress again with a brand new network. This network will probably be large enough to make silly ladder mistakes and such extremely rare if not non-existent.

wjx0912 commented 6 years ago

@tterava "I reduced elapsed_centis to 50.", can you explain why do this change? thanks

bochen2027 commented 6 years ago

Thanks @tterava for help with setting it all up. It has hit top 2 on the cgos charts, something that has never happened before, regardless of how briefly, and it went above 3800+ elo, also a very first-time event... as the 3800 barieer has never been broken by LZ before, regardless of it was only temporary... Not to mention it would have most probably broken 3900 (albiet momentarily) had I not stupidly clicked into the cgos dos prompt box and timed it out with that one game in which it was already a sure win. To be honest I don't think any average pro or even any pro below the top 100 official ratings rankings even have any chance anymore.

http://archive.is/APf2M

tterava commented 6 years ago

@wjx0912 LZ by default waits a full second to get an accurate estimate of how many n/s it is calculating. This estimate is used to figure out whether another move can overtake the best move. It also means LZ always has to think at least a second before it can move.

I think the estimate is good enough even with just half a second window. This allows LZ to move faster in situations where it already has a move ready. It mostly helps with very tight time controls.

Also the results from the run are in and are quite inconclusive to say the least. It was forced to play against a 3900 rated opponent a whopping 14 times and lost almost 100 CGOS rating in the process. Only based on gut feeling it seems divider 11 is a bit too slow, so maybe 8-9 would be a good sweet spot if time management is enabled.

The change in elapsed_centis is probably useless unless playing with extremely tight time controls (<5 minutes per game).

fell111 commented 6 years ago

wow, LZ has reached 3900+ Bayes elo on cgos. Anyone who runs this bot would you please share the weight and h/w info? LZ_g7_t14 3982

wjx0912 commented 6 years ago

@tterava thank you, great job!

diadorak commented 6 years ago

@hydrogenpi Great job! Are you going to test it again?

BTW, last week one guy reached 3840 using a 20*256 network after beating Perseus-8 twice. Then he took the bot down.

diadorak commented 6 years ago

Leela Zero reaches No. 1 :) top

gcp commented 6 years ago

FWIW I finished a tuning run for the time allocation and the base time allocation was increased (just a little bit shy of what was proposed here). CLOP estimated it at 2 to 40 Elo gain, and in my experience it's almost always the former that's more accurate. But it's very unlikely to be a regression so might as well take it.

I didn't see any results posted in this thread with anything near significant numbers so I've ignored them all, as I always promise I do.

leela-zero / leela-zero

Time controls and pondering #1174