official-stockfish / Stockfish

A free and strong UCI chess engine
https://stockfishchess.org/
GNU General Public License v3.0
11.34k stars 2.25k forks source link

NNUE ideas and discussion (post-merge). #2915

Closed vondele closed 4 years ago

vondele commented 4 years ago

I'll create this new issue to track new ideas and channel early questions.

sf-x commented 4 years ago

I'll start. https://github.com/official-stockfish/Stockfish/issues/2908

Viceroy-Sam commented 4 years ago

As NNUE makes good use of intrinsics. Optimize the availability of SIMD/Vector Extensions so that appropriate intrinsics are used where possible. This may mean that Fishtest is changed to identify available instructions for each worker.

vondele commented 4 years ago

@Viceroy-Sam like https://github.com/glinscott/fishtest/commit/cc30c34c3b3a3713f805976c1a8f8d73f8bb86b7#diff-e8d8184dfbe43a62177e9eb695449cd2R181-R217

Sopel97 commented 4 years ago

Apply lazy threshold before considering NNUE evaluation.

vondele commented 4 years ago

As in https://tests.stockfishchess.org/tests/view/5f2c2ca6b3ebe5cbfee85b69 ?

Sopel97 commented 4 years ago

Similar, but I think the value used for lazy threshold in evaluate is more meaningful and possibly in rare cases the patch above would use NNUE eval while normal eval would return after first lazy threshold. So this patch could maybe be improved with

  1. calculate value up to first lazy threshold
  2. if value > t0 then return value
  3. if evalNNUE or value > t1 then return evaluate()
  4. return evalNNUE()
Viceroy-Sam commented 4 years ago

@vondele. Yes. Fishtest to collect and show architecture.

vondele commented 4 years ago

I filed an issue about the collecting and showing : https://github.com/glinscott/fishtest/issues/743

gonzalezjo commented 4 years ago

Is there an easy way to separate bishop piece types into light square and dark square bishops? The NN might benefit from having OCB information.

sf-x commented 4 years ago

Is there an easy way to separate bishop piece types into light square and dark square bishops? The NN might benefit from having OCB information.

This is exclusively a training issue. The proper place is therefore https://github.com/nodchip/Stockfish/

gekkehenker commented 4 years ago

During upload it might be a good idea to explicitly tell that only the author of the net is allowed to upload. (If it doesn't already fall under a CC0 license)

gekkehenker commented 4 years ago

Add option to enable or disable lazyeval / hybrideval.

I've seen it gains elo in Fishtest conditions. But quoting Dkappe: "it performs worse on nets that don’t conform to stockfish eval, which are all of mine. It also defeats the purpose of training nets that play like other engines."

vondele commented 4 years ago

@gekkehenker the upload page already mentions that.... but yes, we might need a tick box. (Edit: issue https://github.com/glinscott/fishtest/issues/744)

concerning the extra option, no I don't think we want to do that.

gonzalezjo commented 4 years ago

Is there an easy way to separate bishop piece types into light square and dark square bishops? The NN might benefit from having OCB information.

This is exclusively a training issue. The proper place is therefore https://github.com/nodchip/Stockfish/

Thanks, but my question was more about the non-training side. The training code seems generic enough that it won’t need to be modified for new piece types. My concern is that it seems hard to split bishops into two piece types without breaking classical eval or movegen.

sf-x commented 4 years ago

he training code seems generic enough that it won’t need to be modified for new piece types.

On the contrary, it's the NN which is generic enough as to need no special treatment. The training,OTOH, does, if you want it to treat the differently-colored bishops differently (since, at present, it is heavily biased towards colorblindness).

romstad commented 4 years ago

concerning the extra option, no I don't think we want to do that.

I personally think this is a disappointing decision. Supporting nets generated in all sorts of way other than Stockfish evals and searches would make Stockfish more valuable as a research tool, and I think it would ultimately also help progress.

vondele commented 4 years ago

@romstad well, that decision is not set in stone. Furthermore, how the nets are generated is of course left fully open (i.e. trained on SF evals or otherwise). However, what I would be reluctant to have would be various different way to 'integrate' the net in SF, i.e. I'd prefer to have one approach (which could by hybrid like we have now, or pure, or additive, etc), which we pick based on playing strength. So, people should be free to test any (reasonable) net, integrated in any (reasonable) way in SF, and if it passes SPRT we'll happy to adjust. Having too many choices early on will make it difficult to change in the future (i.e. do we maintain / support / N different variants?). At least that's my 2c right now.

dorzechowski commented 4 years ago

concerning the extra option, no I don't think we want to do that.

I don't know what would be the best solution here because on one hand, we don't want to multiply UCI options and complicate code but on the other hand, "pure NNUE" could be useful for analysis. In gameplay new patch is stronger but at the same time it is much more blind to spectacular sacrifices because the classic eval kicks in and the move is pruned into oblivion.

For example: r1b1qr1k/2p3pp/4p3/1pb1PpN1/pn3N1P/8/PPP1QPP1/2KR3R w - - 0 1 Here Rd8 actually wins but Stockfish won't realize it now.

Of course changing this because someone might produce a net that is not compatible with Stockfish eval range is not a good idea. But keeping NNUE in Stockfish relevant for analysis is something to consider IMO.

Viceroy-Sam commented 4 years ago

If Stockfish 12 is not an imminent release could some official statement (blog.stockfishchess.org) be released to explain what has happened with NNUE merge the last few days, maybe outline how NNUE works and where the dev is at. A lot of people are using different flavors/branches of SF and that is leading to some inaccuracies in communication.

vondele commented 4 years ago

@Viceroy-Sam we haven't been communicating much beyond the releases. I think the blog, twitter etc, are @daylen. Note also we're still ironing out the wrinkles, and progress is being made very quickly. I'd expect another 10-20 Elo by the end of the weekend.

daylen commented 4 years ago

@Viceroy-Sam @vondele Yeah, I'm happy to make a blog post/twitter with the contents of this commit message (which I think is a pretty good summary!) https://github.com/official-stockfish/Stockfish/commit/84f3e867903f62480c33243dd0ecbffd342796fc

ssj100 commented 4 years ago

In gameplay new patch is stronger but at the same time it is much more blind to spectacular sacrifices because the classic eval kicks in and the move is pruned into oblivion.

For example: r1b1qr1k/2p3pp/4p3/1pb1PpN1/pn3N1P/8/PPP1QPP1/2KR3R w - - 0 1 Here Rd8 actually wins but Stockfish won't realize it now.

Yes, I've noticed many positions where "pure" SF NNUE found quickly that it has become totally blind to since the hybrid patch. It's a little disappointing for me, but then "elo gain is elo gain". The way I see it, it means with the hybrid patch, SF finds moves faster in many other positions, which is "equally" important from an objective standpoint. Furthermore, if there is an objective overall elo gain (which we are confident is true given it passed SPRT bounds on fishtest), that by definition means that SF (with hybrid patch) now finds the "better" move in more positions than without the hybrid patch.

dorzechowski commented 4 years ago

Thinking about it more, it doesn't really make sense to add more UCI switches just to better analyze some positions. It's a bit similar situation to null move pruning and many other features - in general brings Elo but sometimes hurts. Also, whatever we do, someone will complain that some other obscure option is missing. People analyzing games are free to use previous build without this change or even maintain their own fork what I'm sure a lot of them do.

If at some point pure NNUE becomes stronger, I'm sure someone will simplify it back.

Vizvezdenec commented 4 years ago

I think we need a huge search retuning, especially for heuristics that use static eval. Look at fishtest now - basically everyting is passing/close to passing...

LouisZulli commented 4 years ago

If nothing else, having a UCI option named "Use NNUE" that, even when true, might in fact not use NNUE seems confusing now. It seems there are now two possible evaluations: classical and hybrid.

Maybe replace "Use NNUE" with either "Use Classical Evaluation" or "Use Hybrid Evaluation". One or the other, and simply true or false.

Sopel97 commented 4 years ago

Your understanding of the word "Use" is incorrect. Usage doesn't imply exclusivity.

LouisZulli commented 4 years ago

@Sopel97 "Use Classical Evaluation" is certainly unambiguous (and is true by default). Set it to false to enable whatever chimerical NNUE-hybrid is currently considered "best".

Or, maybe "Enable NNUE" instead of "Use NNUE"?

Lolligerhans commented 4 years ago

Should I even keep testing eval patches with non-nnue settings? I would speculate that optimal static (and the success of patches) is rather different for conventional SF compared to a nnue version (where static is only used to evaluate rather unbalanced positions).

vondele commented 4 years ago

Yes, I propose that patches to the classical evaluation are tested with 'Use NNUE=false', the hybrid mode should not dictate, at least for now, how classical evaluation evolves.

Vizvezdenec commented 4 years ago

But we somewhat use classical evaluation as a speedup. Wouldn't it be logical to pass [-3;1] once on NNUE = true to not regress there?

vondele commented 4 years ago

I think I would like to avoid that for now, NNUE is evolving so quickly that this hardly matters, but this can be revised later.

zz4032 commented 4 years ago

The mixed evaluation approach complicates things and has "only" brought 11 Elo (+ maybe further adjustments of the threshold value might gain a bit more on top). But wouldn't you like to keep the NNUE eval separated first, at least for a period of time and retest the hybrid evaluation again later? With that high amount of currently passed patches for NNUE, hybrid eval might be obsolete soon and it interferes with current rapid development with NNUE eval. With "hybrid" I also mean the approach of many currently tested patches that try to replace part of NNUE eval in certain types of positions with classical eval. That looks like making steps backwards.

Sopel97 commented 4 years ago

If hybrid eval becomes obsolete it will be simplified.

TonHaver commented 4 years ago

Isn't it a problem the weak points of the NNUE Net can't be improved (or better: improvements can't be tested) when those weak points are being handed over to classical eval?

NKONSTANTAKIS commented 4 years ago

There will be a struggle between (evolved hybrid eval / optimized search) of a specific net architecture and other architectures of potentially higher ceiling. Up to a point it can be up to the external NN trainers to come up with convencingly superior stuff, but resource-wise its an asymmetric battle.

It would be nice for NN developers to have framework access for researching their potential. The 256halfkp selection might be like taking a 2 liter engine, and evolving it and everything around it (chassis etc) for it. A 4 liter engine will not fit and require different stuff. So once our 2 liter car is very tuned it will be very hard to justify the transition.

Obviously there is no easy solution to this local maximum architecture issue, one has to start from somewhere.

So one has to rely upon intuition on what would offer the highest long term potential and ride it all the way, a crucial decision.

Another idea is to initially offer parallel evolution of different architectures with comparable resources for a modest period and narrow down to the most promising one(s).

NKONSTANTAKIS commented 4 years ago

The other way is doable too but requires discipline: to stop optimization of 256hybrid once elo gains slow down, and transition to a different one by using the same gear and alter it. This might be more efficient if the optimal gear is similar, but it will be emotionally hard to step down a few dozens of elo and allocate effort there.

But also its not for granted that "bigger is always better". This might be true for UCT needing increased eval accuracy for filling the generalised gaps, but for SF it can well be that the highest ceiling is offered by a challenging synthesis of roles, as certain stuff could be done more efficiently outside of a NN.

Much like a F1 car using a mix of automatic, semi-automatic and manual stuff. If the driver is good he can do some stuff better than automatic modules (or good enough and profit from less weight).

vondele commented 4 years ago

@TonHaver right now training doesn't take hybrid into account. So if the weak point becomes better, we might be able to e.g. change the threshold or simplify all away. I personally would be very curious to see what happens when on optimizes a net taking into account its training data only needs to contain positions that it will later encounter in the hybrid approach. I suspect this criterion cuts out many really uninteresting positions (very unbalanced, essentially won anyway).

gonzalezjo commented 4 years ago

I think we need a huge search retuning, especially for heuristics that use static eval. Look at fishtest now - basically everyting is passing/close to passing...

Another run of xoto’s searchconsttune would be nice.

ssj100 commented 4 years ago

Looks like incredible progress already: https://tests.stockfishchess.org/tests/view/5f2f0ff49081672066536b29

@vondele By the way, it might be useful to ensure the gains persist with SMP and scaling - there are some reported issues that since the hybrid patch, scaling is hurt badly. Any chance to run a 8-core RT?

nickolasreynolds commented 4 years ago

I wonder if a smaller architecture that's as fast (or faster) than the classic eval has the potential to replace it in unbalanced positions, too.

vondele commented 4 years ago

@ssj100 those reports are very likely small samples only. Right now, it would be a waste of resources to run another SMP. There will hopefully be another wave of patches, and the next RT will be SMP. More importantly, we have an important issue to fix for SMP (https://github.com/official-stockfish/Stockfish/issues/2933), since right now, we (and all other NNUE branches) probably have wrong results (likely with little impact).

nguyenpham commented 4 years ago

Read a previous post I have known @vondele doesn't like the idea of having the option to turn on/off hybrid mode. Reading some forum posts people are still discussing and love to have that option thus they can run some game analyzing and/or do some testing. I think that kind of use/testing may later help SF be stronger since they can get more knowledge anyway. Note that the majority don't know to code or compile SF.

Perhaps we can help them but using some "hidden" options which won't be listed when getting the command "uci", normal people don't know, don't use but if someone really wants they can know and use. Thus we can help people, solve the dilemma when still keeping policy/mainstream.

syzygy1 commented 4 years ago

I don't really understand what it is doing yet, but I noticed that rotate180() really rotates over 180 degrees (^ 0x3f), which seems rather unnatural for chess.

(This may have to do with castling rights still not being taken into account by the NN? Or am I mistaken there.)

syzygy1 commented 4 years ago

On an entirely different note, how will patches to the classic eval now be tested? If in the NNUE/hybrid mode, then I would expect basically any patch that speeds up the classic eval (by removing feautures) to pass (and to actually gain Elo, at the cost of classic mode).

vondele commented 4 years ago

Classical eval patches will be tested with 'NNUE false', we'd like to keep it in good shape.

vondele commented 4 years ago

castling right are not yet taken into account, but there are extensions of the network architecture that do take it into account. Testing those architectures is for the (near?) future, once we have some experience with the current setup and it is stable.

I haven't checked the role of rotate180 yet.

ianfab commented 4 years ago

I suspect that the 180 degree rotation might come from the fact that shogi has a point symmetric starting position, so if there is no other reason for that choice I would agree that reflecting the axial symmetry of the chess starting position in that transformation would make sense.

jjoshua2 commented 4 years ago

3 Ltc Tests passing with new net. Lowering hybrid threshold and adding new term and raising hybrid threshold. Raising threshold should be good at vltc where speedup isn't as important. With new net it might not have pawn blindspot anymore?

jjoshua2 commented 4 years ago

Can someone make a branch that loads two different nets for a good base for these type of experiments?

vondele commented 4 years ago

The question is not quite clear to me.

Note that this patch https://github.com/Vizvezdenec/Stockfish/compare/add890a10b...5129aab83a almost certainly also increases the hybrid threshold on average. You can use dbg_mean_of(foo); in the code to see the average value of a term, during a bench.