official-stockfish / Stockfish

A free and strong UCI chess engine
https://stockfishchess.org/
GNU General Public License v3.0
11.23k stars 2.24k forks source link

SF NNUE #2728

Closed adentong closed 4 years ago

adentong commented 4 years ago

There has been much discussion on SF NNUE, which apparently is already on par with SF10 (so about 70-80 elo behind current sf dev). People have been saying it can become 100elo stronger than SF, which would basically come from the eval. Since the net is apparently not very big, maybe someone can study the activations of each layer and see if we can extract some eval info from it? In any case, it's probably worth looking into this since it shows so much promise.

vondele commented 4 years ago

that's indeed something to figure out. I would expect that initially the big changes to the engine will come predominantly from changes in net, and our standard tuning of search will just work more or less. It will be interesting to see how much there is to be gained from adjusting search. I would naively assume that the net has an eval somewhat more like a low depth search (as that's the training input), and thus the actual search might have to look a bit more like our usual high depth search, so somewhat less pruning and the like. However, that's pure speculation until we can test.

vdbergh commented 4 years ago

I worry what will happen to sf’s successful incremental development model if sf-nnue becomes the primary focus. I am also not yet convinced that sf-classic is dead.

One thing I noticed from nodchip’s description of the nn tuning algorithm is that it could be equally applied to SF’s classic eval. Another thing is that part of the Elo gain in sf-nnue is due to avx2. Again could also try to use avx2 in sf-classic.

vondele commented 4 years ago

The avx2 related Elo gain comes from the implementation of the NNUE evaluation (i.e. network), nothing to gain in classic mode.

I also hope that classical eval keeps on improving. This is for sure something I try to keep possible. Evaluation patches will still be tested the normal way. As long as NNUE and classic evaluation are separate there is no problem. It will be more delicate if we hybridize.

Concerning the successful incremental development, also that is an important goal. I'd like all patches to pass our usual sprt testing. However, that will require that we have one goal, not a combination of two goals (e.g. search can only be optimized for one evaluation method, not two at the same time).

vdbergh commented 4 years ago

The avx2 related Elo gain comes from the implementation of the NNUE evaluation (i.e. network), nothing to gain in classic mode.

Well in principle I do not see why vector instructions could not be used to speed up a traditional eval also... In some sense the mg/eg mixed values are already a primitive vectorisation.

I also hope that classical eval keeps on improving. This is for sure something I try to keep possible. Evaluation patches will still be tested the normal way. As long as NNUE and classic evaluation are separate there is no problem. It will be more delicate if we hybridize.

Concerning the successful incremental development, also that is an important goal. I'd like all patches to pass our usual sprt testing. However, that will require that we have one goal, not a combination of two goals (e.g. search can only be optimized for one evaluation method, not two at the same time).

vondele commented 4 years ago

In principle, using avx2 for the classical eval is probably possible, but likely very difficult, and speedups small. The matrix-vector operations needed for the NN are very suitable however. Hardware will likely make the NN evaluation even faster in the future.

dorzechowski commented 4 years ago

I would prefer to lock onto one network architecture, at least in the first period, and see how far we can get. Mainstream network architecture in shogi is halfkp_256x2-32-32, the same we have now in the branch. After that we could play with different layer sizes to try for example halfkp_384x2-32-32 but as a trivial change of a constant, without changing the input layer. When we have a strong baseline and good understanding what we are doing, we can try other things.

It's very easy to come up with basically infinite combinations of inputs and layer number/sizes and this is not the way to proceed imo. I'm worried to see many such attempts thrown monkey style at fishtest without any consideration because some people seem to think fishtest can handle everything and give an answer in 5 minutes.

noobpwnftw commented 4 years ago

A credit system can be implemented so that one must contribute enough CPU hours for a test to be started, each costs some credits and reward bonus credits for any successful tests, can take average test pass rate for the reward multiplier. Not necessarily a complex system, just enough to prevent people from spamming junks without contributing anything.

vondele commented 4 years ago

I agree (as mentioned above), we should first test the now supported halfkp_256x2-32-32 and maximize it performance. Let's first assume there will be considerate use of the resources, before we implement/enforce policies. As before, we have approvers that can step in if the resources are not wisely used, but otherwise, I expect we get a long stretch by just communicating what what we think is the right approach.

mstembera commented 4 years ago

This is even more complicated by the fact that larger networks have been shown to be better at longer TC and smaller ones at shorter TC. (Not proven for SFNNE yet but other NN based engines.)

MRMikaelJ commented 4 years ago

It help there, that the extra search speed is likely worth less and less compared to more knowledge in the net the more nodes there is time for once the numbers gets big.

dorzechowski commented 4 years ago

I really hope we don't end up having 100+ MB networks running at 10% of current Stockfish speed, I would put the limit at 50% slowdown. One of Stockfish trademarks is that it's a fast and deep searcher. For slow engines we should refer to other projects.

mstembera commented 4 years ago

@vondele Re you comment about nnue eval being like a low depth search. I ran a couple of 10k fixed depth matches SF Depth 5 vs NNUE Depth 1 4494 - 4910 - 596 [0.479] 10000 -14.5 +/- 6.6, LOS: 0.0 %, DrawRatio: 6.0 % SF Depth 6 vs NNUE Depth 1 6456 - 2879 - 665 [0.679] 10000 130.0 +/- 7.0, LOS: 100.0 %, DrawRatio: 6.7 % So it looks like it's just slightly better than a depth 5 search.

vondele commented 4 years ago

Interesting result. Maybe worthwhile to see what happens at Depth 10 vs Depth 6 (or similar offset in depth).

mstembera commented 4 years ago

Some new unexpected results given the first ones. SF Depth 6 vs NNUE Depth 6 2972 - 6161 - 867 [0.341] 10000 -114.8 +/- 6.8, LOS: 0.0 %, DrawRatio: 8.7 % SF Depth 7 vs NNUE Depth 6 6092 - 2837 - 1071 [0.663] 10000 117.4 +/- 6.7, LOS: 100.0 %, DrawRatio: 10.7 % Looks like the difference between the evals here is less than 1 ply worth of search. On what depth training data was the network trained?

vondele commented 4 years ago

I think it was trained on depth 8 or depth 12 (@gekkehenker ?). However, I think this must not be too surprising, we know Elo gain at STC depths is something like 30-60Elo, which is less than what 1 ply of depth is worth (at around STC depths).

gekkehenker commented 4 years ago

I think it was trained on depth 8 or depth 12 (@gekkehenker ?). However, I think this must not be too surprising, we know Elo gain at STC depths is something like 30-60Elo, which is less than what 1 ply of depth is worth (at around STC depths).

Net was trained on both depth 8 and depth 12 games. Was first fed the depth 8 games only, then trained the resulting net on depth 12 games.

ssj100 commented 4 years ago

@vondele thanks for your hard work in getting NNUE merged - just wondered what SV net is being run on fishtest now?

vondele commented 4 years ago

nn-97f742aaefcd.nnue see https://tests.stockfishchess.org/nns

ssj100 commented 4 years ago

@vondele thanks - just wondered what the corresponding net number etc is from here: https://www.comp.nus.edu.sg/~sergio-v/nnue/

Also which binary is used?

vondele commented 4 years ago

don't know, you should be able to find it from a matching sha256sum netname | cut -c1-12

rooklift commented 4 years ago

nn-97f742aaefcd.nnue is 20200801-1515.bin

TesseractA commented 4 years ago

has anyone tried to use NNUE in FRC? doesn't seem to work for some.

rooklift commented 4 years ago

Hmm worked OK for me here: https://lichess.org/yV7J1imd

vondele commented 4 years ago

I haven't tried but in principle it should work. NNUE only touches eval. Also the classical eval had almost no special handling of FRC (one term if I recall correctly).

gekkehenker commented 4 years ago

In my experience NNUE will play some FRC positions and crash in the rest.

vondele commented 4 years ago

hmm, will be the added code in position that might wrong in that case.

protonspring commented 4 years ago

I am behind the times. . . is this really ~90 ELO better than master on the same hardware?

MichaelB7 commented 4 years ago

Correct - this will be a 100+ Elo gain merge or so - give or take a few Elo.

The mother of all merges.

gekkehenker commented 4 years ago

I am behind the times. . . is this really ~90 ELO better than master on the same hardware?

90 elo conservatively.

On a modern CPU with normal LTC conditions and a PGO build it's a bit stronger than that ;)

TesseractA commented 4 years ago

Note there are certain incompatibilities on old hardware that would make it significantly less efficient.

Also, there are hints there is some significant elo compression at very long time controls with increment.

Also note that contempt has yet to be implemented, which has the potential to present itself as an ELO gainer.

...ALSO note that it's likely just much stronger from the start position than it is from some many-ply-long books, but that claim has yet to be sufficiently backed up.

MichaelB7 commented 4 years ago

I would not get too excited about contempt. Contempt was designed for use against weaker engines. Against equal or stronger engine, it’s just about worthless. So the only thing contempt does is squeeze a few extra elo out of much lower rating opponents. I would be hard pressed to say contempt makes it better - it squeezes a few Elo out of weaker opponents. It falls into the realm of being a vanity of vanities.

mstembera commented 4 years ago

@MichaelB7 Not having contempt cost SF the qualification into the TCEC SuFi one season.

vondele commented 4 years ago

NNUE evaluation has been merged, I'll close this issue. Thanks for the discussion.