official-stockfish / Stockfish

A free and strong UCI chess engine
https://stockfishchess.org/
GNU General Public License v3.0
11.34k stars 2.25k forks source link

NNUE ideas and discussion (post-merge). #2915

Closed vondele closed 4 years ago

vondele commented 4 years ago

I'll create this new issue to track new ideas and channel early questions.

syzygy1 commented 4 years ago

I'll prepare something which should either speed things up or show that I don't quite understand what is happening ;-)

syzygy1 commented 4 years ago

I created a simple patch that calls update_eval() at the beginning of do_move() and do_null_move(). This seems to eliminate all refreshes but the speed up seems to be rather small. Anyway, I have submitted a test.

Edit: https://tests.stockfishchess.org/tests/view/5f39a68ce98b6c64b3df4218

Viceroy-Sam commented 4 years ago

I notticed the author of YaneuraOu in this tweet and linked article is questioning the optimal NNUE network size, although aimed at Shogi, the question is relevant to Stockfish too.

Article in Japanese Article translated into English: Is the default network size of the NNUE evaluation function optimal?

vondele commented 4 years ago

Yes, I've mentioned elsewhere that I believe we're ready to start experimenting with different net sizes and input features.

I also think we should contribute to the learning repository (https://github.com/nodchip/Stockfish) to make the code robust and easy to use, so that more people can experiment with net building and training. A good place to look for starters are the scripts by @sergiovieri https://github.com/sergiovieri/Stockfish/tree/scripts/nnue which however rely on slightly modified learner sources.

nodchip commented 4 years ago

Any pull requests for my repository are welcome.

vondele commented 4 years ago

Since nps is quite different between the beginning and end of the game, I wonder if we should allocate a little more time to the opening now. Maybe @protonspring has an intuition on how to change things..

sergiovieri commented 4 years ago

I will slowly port my changes to nodchip's repo, so that my scripts will be compatible. However, I'm very busy with other things right now, so it will take some time.

vondele commented 4 years ago

Not directed at you Sergio, but to the community as a whole... let me say why I'm pushing to support and help @nodchip with his branch. I believe we have received with NNUE a wonderful gift, and I feel it is important that we give back. This will be a beneficial journey for the full community.

syzygy1 commented 4 years ago

Is SF-dev not going to include the learning part?

vondele commented 4 years ago

not for now at least, the decision was to maintain that in the nodchip repo. The merge would have never happened (on that short timescale) if the scope of the project was too broad. Is a lot of effort already.

nodchip commented 4 years ago

Computer shogi developers studied many things from Stockfish in the past. And we have also studied many things from the Stockfish NNUE project. This project is beneficial both for the Stockfish community and the computer shogi community.

Vizvezdenec commented 4 years ago

Actually I think we can try to tune piece values. They are used in some heuristics in search and are added to static eval, since now we have NNUE static eval they can be quite far off (?)

unaiic commented 4 years ago

I'd like to tell you some things about further NNUE improvements.

I contacted yaneuraou via Twitter (the author of blogs about shogi and more recently NNUE and SF), who claimed that if some changes were made into SF (ideas from shogi engines) it could be 200-400 ELO stronger. We've seen so far it worked with NNUE, so I asked him for more details/techniques about those ideas, and see if they could be used in SF. He answered me and said he'd make a new post about it. He finally did it: http://yaneuraou.yaneu.com/2020/08/21/3-technologies-in-shogi-ai-that-could-be-used-for-chess-ai/

The main ideas are:

He gives more details in the blog, and I think we should take a look at them and see if they are worth testing. NNUE worked for us, maybe these might as well...

syzygy1 commented 4 years ago

The third idea sounds like Brainfish.

Vizvezdenec commented 4 years ago

2nd idea is probably the most promising.

protonspring commented 4 years ago

For starters, here is a very easy way to proceed:

1) train one NN only on the first 20 moves of games, or all positions with more than 10 pawns (mg_value). 2) train another NN only on later moves of games, or all positions with 10 or less pawns (eg_value).

Then scale mg_value and eg_value as we did before.

protonspring commented 4 years ago

Later, I'd expect to have different NN for different endgames, but this is a bit far in the future.

MichaelB7 commented 4 years ago

My thought is that training should not be based on move number - but more aligned with piece count. 32 to 24. 24 to 16, endgame training would be from from piece count 16 pieces to 12 , 12 to 8 - or something similar.

Vizvezdenec commented 4 years ago

I had the same idea like MichaelB7. Like make NN (ideally) for every piece count from 3 (I don't think we need and NN for K vs K) up to 32. Even more "heavy but precise" approach will be to make NN for every possible material configuration with certain piece count. Well, I know it's kinda not an easy thing to do but I'm talking about "ideally". For now even smth like a separate engame NN will be a good thing I guess? There is a lot of room for experimenting.

vondele commented 4 years ago

these things on piececount are already being tried by Sergio and tttak, btw.

dorzechowski commented 4 years ago

It's not urgent by any means but Brainfish-like bookbuilding is something that should be tackled at some point imo. Regardless if it would be allowed in competitions or not, it's just knowledge that an engine could use, especially that an opening phase had been very much neglected so far. Brainfish is an interesting project but its learning platform is not open and being dependent on one person doing updates is a dead end.

tzipproth commented 4 years ago

The third idea is indeed Cerebellum (BrainFish only being a vehicle for it). The only minor issue in the original post by yaneuraou is that the opening book tree cannot be recalculated (perfectly) with a pure minimax search, because it is not a "tree", but a graph with loops / repetitions.

tzipproth commented 4 years ago

@dorzechowski, I think it could be possible to make the Cerebellum platform open (I cannot decide this by myself alone). I've never thought about this, because using such a self-generated library in engine tournaments, competitions etc. was usually widely rejected, except from Stefan Pohl. Maybe because is was never possible to distinguish such an automatically self-generated opening book library from a handcrafted book, because of the used public opening book format.

noobpwnftw commented 4 years ago

Open implementation of idea number 3: https://github.com/noobpwnftw/chessdb Full data is available, you can work on your own back propagation method as the data at the very leaf is simply SF(fairly recent) depth 24 result, shape of the tree is developed by depth 7 multi PV with a 200cp margin or a minimum of top 5 moves.

unaiic commented 4 years ago

Regarding the first point, @nodchip told me about the implementation he used. I took his scripts and created a repo (https://github.com/unaiic/optimizer) where we can adapt them to SF and see how it goes. The scripts make use of Hyperopt, although we could also use Optune; we should see what is best in this case. I'll also mention this in fishtest, as they'll surely have more expertise with this :)

tttak commented 4 years ago

Close to the 2nd idea, I previously implemented HalfKP_GamePly40x4 and HalfKP_PieceCount. They subdivide HalfKP into 4. (features : 41024 * 4 = 164096)

I'm not familiar with chess myself, so there may be a better way to implement it. I heard that Sergio trained HalfKP_PieceCount a bit, but it didn't work well.

NNUE net can be split into two parts: feature_transformer and network. The above implementations switch the feature_transformer part [164096->256x2], but the network part [256x2-32-32] is the same in all phases. So it may be a little different than really switching 4 nets.

syzygy1 commented 4 years ago

Naive question, but how does training work currently? Just feed the net with a large number of positions and desired evaluations where the desired evaluations are determined by relatively shallow SF (SF-dev?) searches?

The more SF-dev fiddles with the NNUE scores (hybrid, multiply by 5/4, dampen with rule50_count()), the more problematic this would seem to get.

Serianol commented 4 years ago

To train, you need a large amount of data (around 1 billion is the norm right now) that contains a fen + eval + game result. It's usually depth 8 to 14 data. In training, the parameter "lambda" is used to gives more importance to eval or game results (1 is pure eval and 0 is pure results). This data can either be generated from the training binary or converted from any other source (they are pgn convertors for example). You can even use non-sf data. If you use the training binary to generate data, you have the choice to create data from classical eval or from a net by switching on and off the Use NNUE uci option. The lastest source from nodchip use hybrid when Use NNUE = true. (not sure a lot of people experimented with that yet)

crossbr commented 4 years ago

I understand the decision to keep the net-training code in the nodchip repo. However, it seems to me that the net-training process is presently not set up to be in the spirit of the Stockfish project. That's because presently, so far as I can tell, net-training itself is done by individuals working individually, rather than by a community working together. This is not at all a criticism of the present net contributors. My hat is off to them in respect and gratitude for all their work and time, especially @sergiovieri. Rather, it seems to me that an open-source project like SF would benefit from being transparent and collaborative in the net-training process too, as something done as a team in community. As a team we have people (e.g. DragonMist) who have chess knowledge about important chess databases. We have good connections with the Brainfish/Cerebellum folks. And noob's database is also potential resource for net-training. These skills, connections, and resources, among others, if all brought together in a shared collaborative project, could potentially improve the net-training process. If our net-training process were collaborative and participatory, we could learn from each other what works, and what doesn't, and build on these discoveries. That seems more in keeping with the spirit of the Stockfish project.

https://github.com/sergiovieri/Stockfish/commit/da601927a61d1855bafa08ba6d771727cd1b5b91#commitcomment-41720841

vondele commented 4 years ago

yes, I agree that the process could be improved, in particular i would love to see an end-to-end workflow documented based on the nodchip repo (i.e. without local hacks). Only when we have well-established workflows will it become possible to setup a framework like fishtest do e.g. data generation. I do note that there is quite some online collaboration, but it is taking place on discord : https://discord.gg/c4aBQt (should be a short-lived inivite to discord, don't know how this goes otherwise).

nodchip commented 4 years ago

I don't have any strong opinions for now. I will follow a process one the community decide it.

crossbr commented 4 years ago

Thanks @vondele. The SF discord is much more active than when I visited it a month ago, and I'm glad to see that. Perhaps we could think about a list of preliminary questions we would need answered in order to arrive at a well-established workflow for distributed development of nets. The SF discord crew would likely help conduct tests to answer those questions. And someone might even volunteer to be a designated point-person for the project of developing the needed workflow.

nguyenpham commented 4 years ago

Jouni did a test and I confirmed the result about NNUE without Syzygy can vs Classic+Syzygy. No conclusion since both tests were so small. Just idea and hope someone or Fishtest can do some serious tests.

http://talkchess.com/forum3/viewtopic.php?f=2&t=74880

nodchip commented 4 years ago

I'm thinking how to make progress in my repository. My current procedure is:

  1. Implement ideas posted in the Issues in my repository.
  2. Merge pull requests from other developers.
  3. Create a branch to merge the official stockfish master.
  4. Post to fishtest to check if there are regressions in the engine part.
  5. Merge the branch to the master of my pository.
  6. Release a new binary set.

There are at least two problems in my procedure.

One is that we can not avoid to introduce bugs because of changes in the engine part. Recently, some developers reported the learn command does not work well. I guess that this is because of the introduction of hybrid eval. But I have not investigated. I think that similar enbug will be happened in the future. We need concrete methodologies to test the program to avoid these kind of enbugs.

The other is that it is hard to detect bugs in the training data generator and trainer. When I implement a new feature in the training data generator or trainer, I add a new option to disable the new feature by default. We can avoid that the new feature implicitly is enabled when he or she uses a new machine learning binary, and the results of training data generation or trainer are changed. We can also avoid to encounter new bugs in new features. But we can not avoid enbugs in new features. We also need concrete methodologies to test the program to avoid bugs in new features.

Are there any ideas?

mstembera commented 4 years ago

Re the learn command... Besides hybrid we also use the 50 move rule to damp down the evals. I don't know if the 50 move rule is or should be encoded as a NNUE feature.

sf-x commented 4 years ago

The current parameter sets seem to have reached their ceiling, at least without extreme effort like SPSA on all parameters. And that's good, because it means a new net architecture can be evaluated in a few days. OTOH, if extreme efforts are spent on current arch, there will not be a chance for a better arch to be proved as such.

I suggest to try changing the activation function to clamped quadratic. This would allow the network to represent products of layer outputs (using the formula 4*a*b=(a+b)^2-(a-b)^2)

vondele commented 4 years ago

I'm thinking how to make progress in my repository. Are there any ideas?

@nodchip this is indeed a difficult problem to solve. Some form of automated testing will be needed, be it in the form of unit tests for those parts that are suitable, or in the form of regression tests. I believe that one needs to establish a few scripts (probably python glue) that run all the steps of data generation and training (obviously on small amounts of data and small number of training steps, i.e. minutes of run time), that can be tested, checking if basic properties of e.g. the optimization are in a reasonable range. While this can't capture everything, it will catch some things.

In my experience, adding options or optional features will often lead to more bugs, especially if not all options, and combinations of options, are tested carefully.

nodchip commented 4 years ago

@vondele Thank you for the advice. I will create an semi-automated test at first. The test will generate training data, generate validation data, train, and check if the cross entoropy is decreased. But I think that the test will take a long time. I will think if we can make the test automatically.

I agree that adding options or options features will often lead to more bugs. On the other hand, many options would be helpful for experiments. I will think what we can do to avoid enbugs when we add options or optional features.

vondele commented 4 years ago

I think one of the challenges of the learning code is that lots of experimentation is needed to make progress, hence the options. Somehow it makes sense to encourage the experimentation, while at the same time establishing the basic scheme that is known to work. For SF we have a rather clear procedure when to modify master, which simplifies things tremendously when it comes to maintaining stability. That seems, at this point, still more difficult for the learning code. There must be some expertise concerning this question in the ML community, we're not the first to run into this problem.

noobpwnftw commented 4 years ago

Clear procedure is always a beneficial overhead regardless of the amount of moving parts involved. The more complicated how things work together means more reasons for it to go wrong, also harder to reproduce the results for someone else. If people can precisely describe what are the steps conduct their experiments, then they should have no trouble having others review their work and verify the results.

Meanwhile, tooling for performing ML has nothing to do with how one should conduct experiments, tools are only required to perform the work as described and has nothing to do with whether one would produce a good net or not, these are completely different things.

People run into such problems when conducting ML because when they fiddle around things here and there, no documentation or record was made for later reference, just like one edits code with no version control and no backup. Due to the nature of ML, it may not outright fail to compile or crash, but just produce some weird results or not good enough.

r2dev2 commented 4 years ago

Hello all, I had a dataset of 13 million chess positions and their evaluations lying around here if it will help train NNUE.

vondele commented 4 years ago

I'll close this issue, new ideas please post in the forum, as an issue, or ideally as a patch or PR.