official-stockfish / Stockfish

A free and strong UCI chess engine
https://stockfishchess.org/
GNU General Public License v3.0
10.98k stars 2.22k forks source link

SF NNUE #2728

Closed adentong closed 4 years ago

adentong commented 4 years ago

There has been much discussion on SF NNUE, which apparently is already on par with SF10 (so about 70-80 elo behind current sf dev). People have been saying it can become 100elo stronger than SF, which would basically come from the eval. Since the net is apparently not very big, maybe someone can study the activations of each layer and see if we can extract some eval info from it? In any case, it's probably worth looking into this since it shows so much promise.

sf-x commented 4 years ago

Dear @dorzechowski , did you consider that parameter sets are not given by aliens, but are generated using the code you happily removed, which therefore needs to be maintained anyway, and your removal is just busywork?

dorzechowski commented 4 years ago

@sf-x Feel free to fix it in your branch. I was clear from the start that I wanted to have code only for playing.

vondele commented 4 years ago

From my side, I'll be discussing with @snicolet how we can best proceed later this week. However, I will be off-line until the weekend, so some patience, please, but we'll eventually come up with a plan. I appreciate @nodchip support, and we'll try to keep things in sync. Having a few people study the branch by @dorzechowski seems like a good idea, this appears a good starting point.

However, it needs to be appreciated that this is a rather large project, which is best done in steps, to keep master in production-quality shape. Starting with playing-only capability seems quite natural, and already at that point needs some changes to the fishtest infrastructure that will require some effort to implement. Having our normal SPRT basic process for changes to the code will be important to make sure that we keep having steady progression.

vdbergh commented 4 years ago

Just a reminder that this code could be useful https://github.com/glinscott/fishtest/pull/547 .

vondele commented 4 years ago

yes, @protonspring work could be useful in this context.

FireFather commented 4 years ago

IMO, nodchip's NNUE source code is ready to be integrated immediately. It's been very well distributed and widely tested for months now. The whole codebase is wholly functional and integrates perfectly to sf-dev (it's been kept current), all this mainly due to nodchip's attention to detail.

I really think there's no need to wait, of course I'm not talking about fishtest...just the simple testing and integration of nuue into the master branch. It was written in a very similar manner/style to SF and would be super easy...just a handful of simple changes to Stockfish dev in order to utilize a UCI option to turn NNUE 'off' and 'on'. I (and others) have done this and it works great.

noobpwnftw commented 4 years ago

You keep jumping to conclusions, so far, there is one fixed game test that looked fine(which is not even an non-regression test), against two different branches of master(which the new branches contains master changes other than NNUE, tested against older master), and there is not a PR being filed that is ready to merge. Bottom line, what exact branch of NNUE fork is ready to merge is still unclear to me, your tested one or nnue/master?

I agree that there is no need to wait for someone to clarify those.

vondele commented 4 years ago

I'll announce this weekend what the plan for integration is, however it will start from the nnue-player-wip branch discussed above.

As I said before, we'll go in steps and it needs updates to the fishtest infrastructure, which can't be done immediately but are essential.

Fanael commented 4 years ago

yes, sure. Haswell introduced avx2, roughly 7 years ago, so I'm not too concerned if that would be the required 'modern' for NNUE.

Intel is still segmenting on AVX, so there are Comet Lake (i.e. another Skylake re-release…) CPUs released this very year that don't have AVX.

be able to build the right architecture, probably x86-64-avx2 (on AMD) or x86-64-bmi2

A better idea would be to enable BMI2 but avoid the pext and pdep instruction specifically on AMD, perhaps by providing x86-64-bmi2-pext and x86-64-bmi2-no-pext "architectures"? The remaining BMI and BMI2 instruction are fine, and indeed sped things up the last time I benchmarked it, it's only these two that absolutely murder performance due to being microcoded.

dorzechowski commented 4 years ago

Nothing is ready to be merged to master and also it's not "super easy" at all. The thing to do first is to push nnue stuff as a branch to Stockfish official repo in similar manner as we have for example a branch for cluster version. Then we can work on it, create pull requests for this branch, etc. It's still a long way to go before it can be merged into master and maintainers shouldn't (and I'm sure won't) give in to the pressure from overhyped individuals who demand someone(TM) to do it now.

FireFather commented 4 years ago

I suppose that's addressed to me. Please note: I didn't demand anything...I was simply responding to vondele's "suggestions welcome".

The wip branch contains just a tiny part of the codebase, and doesn't work...ie loading the nn.bin causes it to crash. I'm just not understanding the value of starting this way. Anyone kind enough to explain without insults, it would be appreciated.

FireFather commented 4 years ago

You keep jumping to conclusions, so far, there is one fixed game test that looked fine(which is not even an non-regression test), against two different branches of master(which the new branches contains master changes other than NNUE, tested against older master), and there is not a PR being filed that is ready to merge. Bottom line, what exact branch of NNUE fork is ready to merge is still unclear to me, your tested one or nnue/master?

I agree that there is no need to wait for someone to clarify those.

What is the value of the one test? It didn't actually use any of the nnue code...that's where I'm at a loss to understand. nodchip's repository (the original) is the only one that should be considered IMO...

vondele commented 4 years ago

@FireFather for me it seems possible to load a network and play with that branch. However, maybe there is a sequence of command that fails in which case it needs fixing.

The rationale for starting from the player part only is because that will be well testable. We will be able to test all code changes with the usual procedure of running SPRT tests, guaranteeing steady Elo progress and a lean codebase. This part of the code should also be 'easily' integrated with our standard CI procedure to catch bugs.

The additional tools needed to generate training data, and to train networks have different metrics to judge quality, like many of the other tools that are in use to develop engines. Right now it seems best to have those evolve independently, even if we'll try to avoid divergence.

FireFather commented 4 years ago

Understood and agree completely... The codebase is however full of (built with) dozens of preprocessor directives, one of which is #define EVAL_NNUE so for ex: defining that only, creates a 'player only' compile, with none of extra training code included.

The wip branch I saw had many many changes, many hundreds of lines of code deleted, different namespace, etc. It just seems overly complicated, labor intensive, and completely unnecessary to me. Reassembling it all, restoring all the individual deleted lines of code, and getting the training code to work is possibly going to prove much more difficult IMO.

Anyway, I am fully aware you guys are capable and need time... Thanks

Vizvezdenec commented 4 years ago

Well I hope this will be reasonably soon, I want to try my ideas in search on a branch that will be actually the strongest one ;)

dorzechowski commented 4 years ago

@FireFather I didn't have any crash on loading nn.bin. Please describe steps to reproduce.

Keeping code full of #define EVAL_NNUE sections is not a promising way forward imo, and actually there would be no hope of having such things in master. Otherwise we would need to produce two different binaries every time to keep it in sync, even worse, we would need to ask abrok guy to do this for us and then hear complaints all the time that a binary for a specific CPU/eval combination is not there. Also in every single test telling workers which version to compile would quickly become a nightmare. Using NNUE or not should be optional for end user and there should be one binary to rule them all.

I believe introducing new code, especially that huge, should be kept to the minimal essentials that are neccessary for playing games and can be easier maintained and adapted to SF standards. Otherwise first thing to do would be to remove those thousands of lines and run simplification test. There is always time later to add more things, although I don't think Stockfish engine is the place for all the code for network creation, and training, this should be a separate tool.

My branch is absolutely not perfect, in fact it has WIP in the name. If there are some better options, they should obviously be used instead but frankly I don't see any better concrete suggestions as yet.

FireFather commented 4 years ago

Of course I'm not recommending keeping the #defines. My recommendation is to compile everything together, and remove/replace the preprocessor directives as UCI options. In this manner achieving one binary...keeping the #defines (temporarily) to exclude code. Pretty simple approach.

I agree, for now, having a 'play' only binary makes sense...my point is you can have that now by changing one line.

As far as "there would be no hope of having such things in master", they do exist there already I believe.

But if you want to re-write everything as you go...that's you-guy's decision. But it's hard to understand it, the original code by nodchip is so well done and perfectly compatible and functional now.

dorzechowski commented 4 years ago

Feel free to do it your way, point maintainers to your branch and let them decide.

There are no #define blocks excluding code in Stockfish as far as I know.

Everyone likes nodchip's repo and appreciates his work but then there is nothing stopping anyone from using it. We talk about merging it to master and testing it on fishtest and this is not that easy. There are many things to be adapted, both in fishtest and introduced code which should conform to Stockfish coding style, pass a review, SPRT tests and CI tests before being integrated.

mstembera commented 4 years ago

IMO It's not obvious whether completely separating out the training code will spare or create more difficulties. For example it would be nice to at least share some header files to keep common data types and struct definitions in sync. A relevant reference from the past is the Tuning branch. It started out separately but was eventually(recently) merged into master due to high cost of maintenance. Would having one code base but separate make targets for playing and training executables be a helpful alternative? I'm glad @vondele and @snicolet are taking their time to make this decision carefully and looking forward to contributing to NNUE.

dorzechowski commented 4 years ago

@mstembera It's true that those things will need to be kept in sync eventually. It's just my belief that it's better to have possibly smallest and easiest first step, make it clean, make it work and only then add on to it when we know what to add, how and why. I also have full faith in our maintainers to come up with the right approach.

FireFather commented 4 years ago

One more thing: Nodchips repository is kept synced, up-to-date on a regular basis. It is essentially very recent stockfish dev with nnue added. All of which has been compiled, used and tested thoroughly on a very wide spectrum of hardware and OS's. Every part of it put to the test for at least 2 months now. It's even played in TCEC. There are no bugs and I have never seen it crash, except in the very beginning when it couldn't find nn.bin.

Acknowledging that and accepting the changes a bit more freely would be save tons of work, time and energy. A branch could be created and the current NNUE repository used 'as is'...

Concerning the #defines..the codebase relies heavily upon these preprocessor directives (which can be very effective), for ex: he uses a script to produce 30 or more binaries each release...(every couple of days). But obviously only a few of these would be needed for a 'play-only' release...the same ones that are produced for regular SF. (each having a new UCI option for UseNNUE or something similar) .

PS There's currently a freeze on nodchip's repository for the past couple days...he's aware you guys are starting to integrate it.

noobpwnftw commented 4 years ago

The combined said "testing" effort is however nowhere close to a single LTC worth of effort on fishtest. While people are in the process of adding functions to fishtest so that the code can be tested with a trained net, the effort of introducing a minimal player-only code is WIP, and so far haven't passed any of the standard procedures for a patch to get merged.

Your suggestion is essentially skipping all testing procedures and dev-ops people agreed upon, and ask people do what you and your friends say and so on. What does it matter to you to have such a branch created "here and now"? Are you unable to use whatever repo/fork/branch that is already there and do what you want? If it is for some kind of ego then I think it is meaningless, if it is for better maintenance or quality control, then procedures have to be followed instead of "save tons of work, time and energy".

Also I don't see any point for anyone to feature freeze while we work on a integration, it isn't like the patch is final and nothing can change afterwards.

I just can't make sense of it: on the one hand you think the other repo is already perfect, while on the other hand you want it copy-pasted here without any oversight, testing or further work, so what goal do you want to achieve?

FireFather commented 4 years ago

I don't think it advisable to skip testing procedures, at all I don't think it's a good idea to copy/paste without oversight... run all the tests necessary...tests are good

I'm suggesting to keep it simple, not cutting the code into a hundred pieces only re-write/re-assemble it. It's fully functional now.

What does it matter to me? I don't know, why does it matter to anybody?

noobpwnftw commented 4 years ago

So why exactly do you think people rewrote it instead of copy-pasting? Do they do it simply for fun or just wanted to annoy you?

adentong commented 4 years ago

Honestly what's the use in arguing about it? The maintainers will decide what to do and that should be that.

noobpwnftw commented 4 years ago

Because there are more people who don't talk while doing the work, and there are people who just keep "suggesting" you should to do this and that, now you don't like people talk too much? I hate to break this to you but this is likely what to expect for the upcoming NN hype, and I'm going to return the favor at the same level.

adentong commented 4 years ago

@noobpwnftw Not sure if you were replying to me, but if you were I have to say I'm not sure I understand what you mean. Regardless everyone needs to calm down here.

lp-- commented 4 years ago

@vondele: Is not it possible for the purpose of single test to add nnue-net as patch to sources of @dorzechowski branch, add path to it in sources and run vs master on fishtest right now without any changes to fishtest?

noobpwnftw commented 4 years ago

@adentong My point is that people are working to get it integrated, there are a lot of discussions offline, just not everyone live updates what they are doing 24/7. There are people who just are just unsatisfied about almost everything, they are free to express their opinions and so do I.

gsobala commented 4 years ago

Is it even sensible to Fishtest NNUE against master? The ratio of NNUE v master performance is hardware dependent, and e.g. NNUE may pass on hardware A but fail on B. It is a lesser version of the GPU v CPU problem.

adentong commented 4 years ago

@noobpwnftw Oh yes I 100% agree. Everyone's free to voice their opinions and I was by no means saying otherwise, though personally I would have just ignored any overzealous suggestions instead of arguing. But hey you're free to do what you want.

noobpwnftw commented 4 years ago

Is it even sensible to Fishtest NNUE against master? The ratio of NNUE v master performance is hardware dependent, and e.g. NNUE may pass on hardware A but fail on B. It is a lesser version of the GPU v CPU problem.

I've just made sure that every fishtest worker I have is fully capable of running NNUE with AVX2 + BMI2(which if I understand correctly, is all what it needs to perform well). Although I have mentioned that on AMD there are certain BMI2 operations that are just slower, I can still update the build commands should there be a new one that overcomes the specific problem.

vondele commented 4 years ago

I've started an issue to track the merge and tasks https://github.com/official-stockfish/Stockfish/issues/2823 lets keep that issue specific to work on the merge, I'll keep this one open for more general discussion.

Let's have fun with this new development!

vondele commented 4 years ago

just a note, testing clang++ (10.0.0-4ubuntu1) vs gcc (Ubuntu 9.3.0-10ubuntu2), I find that clang gives about 4% more nps on x86-64-avx2 for NNUE bench.

gekkehenker commented 4 years ago

How do we solve a (hypothetical) issue that we get a rock-paper-scissor situation? A > B > C > A...

After passing SPRT against old Devnet, quick regression test against HC + selected previous nets as a sanity check?

vondele commented 4 years ago

that was already raised as an issue elsewhere. However, that's not very different from the current situation we have with patches... we don't see it often. Probably up to the maintainer to break the cycle, and wait for D >> A,B,C ?

dorzechowski commented 4 years ago

We can always check for regressions also against Stockfish with the standard eval, perhaps adjusting TC to make it a closer match. For example, if at STC the NNUE version is stronger by about 50 Elo, the "classic" SF could get 25% more time and be a valid quasi-independent 3rd party to check for regressions against. I don't think this will be a problem. The only change needed would be to introduce asymmetric TC in fishtest.

vondele commented 4 years ago

I would wait for the situation to occur before we find solutions for it ... other things need to be sorted out first.

Amplaytro commented 4 years ago

@vondele The modern CPU compile available at abrok.eu is not a SSE 4.1 built, please can you make sure that SSE 4.1 compiles are available on abrok.eu hence forth. Thank You very much for your help.

dorzechowski commented 4 years ago

@vondele Before merge please also consider releasing the last non-nnue SF version as Stockfish 12. Even if we don't have +50 Elo there yet, the current 25-30 Elo is not bad at all (and there will be at least few more patches in). More importantly, it would serve as a nice reference point.

jjoshua2 commented 4 years ago

Is there an issue for the fact that contempt doesn't change evaluation currently? Last I checked it didn't really do anything, but probably could do a simple addition to output of eval? Whether or not that gains elo is a different question, but if it doesn't it would default to zero, and it would still influence whether 3 folds are taken.

On Thu, Jul 30, 2020 at 9:53 AM Dariusz Orzechowski < notifications@github.com> wrote:

@vondele https://github.com/vondele Before merge please also consider releasing the last non-nnue SF version as Stockfish 12. Even if we don't have +50 Elo there yet, the current 25-30 Elo is not bad at all (and there will be at least few more patches in). More importantly, it would serve as a nice reference point.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/official-stockfish/Stockfish/issues/2728#issuecomment-666378525, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXIQNDIMIGLVV4HVTVBWHDR6F3OBANCNFSM4N2PD3TQ .

Amplaytro commented 4 years ago

I don't like the idea of making the latest sf dev as sf 12, regular sf should also keep developing normally and sf 12 should only be released when that +50 elo target is achieved.

vondele commented 4 years ago

@Amplaytro I'll contact the abrok owner. modern will automatically be 'OK', but it would be nice if avx2 builds are available.

vondele commented 4 years ago

not yet sure about sf12 prior to the merge, spontaneously, I would have said a while after the merge would be better.

gekkehenker commented 4 years ago

Is there an issue for the fact that contempt doesn't change evaluation currently? Last I checked it didn't really do anything, but probably could do a simple addition to output of eval? Whether or not that gains elo is a different question, but if it doesn't it would default to zero, and it would still influence whether 3 folds are taken. On Thu, Jul 30, 2020 at 9:53 AM Dariusz Orzechowski < @.***> wrote: @vondele https://github.com/vondele Before merge please also consider releasing the last non-nnue SF version as Stockfish 12. Even if we don't have +50 Elo there yet, the current 25-30 Elo is not bad at all (and there will be at least few more patches in). More importantly, it would serve as a nice reference point. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2728 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXIQNDIMIGLVV4HVTVBWHDR6F3OBANCNFSM4N2PD3TQ .

Contempt code currently resides in eval. This could be reverted back to being situated inside the search code. But that's something for Fishtest in the future.

vondele commented 4 years ago

yes, concerning contempt, that is not there with NNUE. Notice also our contempt is not just a change of drawvalue or some such thing... this likely can't be easily 'added' to NNUE (but it presumably is built-in in part during the training phase, i.e. the net should have learned the contempt it was trained with). Stuff for future work :-)

TesseractA commented 4 years ago

I won't claim to know anything beyond the current domain of knowledge, but will some search improvements hinder Stockfish and improve Stockfish nnue? Seems to me that search optimizations were built around evaluations of fixed sizes, and sharing the search code between classical evaluation and NNUE evaluation might create some rather difficult decisions regarding certain search methods.

For example, if for some reason the Stockfish project thought it was a good idea to include a very large network, other search methods as wild as an MCTS-like method could be considered due to the MCTS bottleneck decreasing.

Are we not planning for that type of specialization by just keeping everything that search needs in a single file, or splitting the search files between NNUE and classical? If it were somehow more practical with less slowdown to keep NNUE and classical search in one file when attempting to combine the best of both worlds, would "blindly" splitting the search files lead to an impractical and unnecessary code overhaul?

vondele commented 4 years ago

If we merge, my current feeling is that search improvements should be directed to the single method that performs best. Doubling the (human) resources needed to develop and maintain the code is not a good idea. I still need to decide how to proceed after an eventual merge, so that we make progress quickly without disrupting code too much. I think the first thing to do is to focus on improving the networks till we have more or less a steady state. This could be reached in a couple of weeks. Afterwards we start improving search to match the new eval and start exploring further evolution of the functional form of the new eval (i.e. new networks with new features, or combinations with the classic evaluations). Other opinions welcome.

mstembera commented 4 years ago

I also think testing search patches against whichever version of eval performs best is the way to proceed.

jjoshua2 commented 4 years ago

Once you start having lots of search patches based on a certain network size and set of input parameters there is a good chance it will get locked into that one since it will be too much effort for one person to switch all of it and to reoptimize similarly. It would require a big or universal improvement. Unless there is a method to switch architectures based on a smaller fixed amount of compute (which is common for tuning NN architectures).