Stochastic optimization for parameters

unaiic commented 4 years ago

https://github.com/official-stockfish/Stockfish/issues/2915#issuecomment-678112885

Here some interesting methods were proposed. But regarding the first one, and after having mentioned to @nodchip, he told me about the implementation he used. I took his scripts and created a repo (https://github.com/unaiic/optimizer) where we can adapt them to SF and see how it goes. The scripts make use of Hyperopt, although we could also use Optune; we should see what is best in this case. I think you could help with this :)

nodchip commented 4 years ago

Let me explain what the scirpt does. The script executes the following steps:

Read search parameters from a config file.
- This config file is formated as a header file with definitions for search parameters.
- Each search parameter has the following data.
  - name
  - default value
  - min value
  - max value
Select a search parameter set.
- By the default settings, the first 20 search parameter sets are selected randomly, and the other parameter sets are selected with Tree of Parzen Estimators (TPE).
Write the selected search parameters to a header file.
Build an engine binary with the header file created in 3.
Measure the elo difference between a baseline binary and the binary compiled in 4.
Return the elo measured in 5 to Hyperopt.
Repeat 2-6 specific times.
Apply Gaussian Process to select the best search parameter set.

A point is the elo measurement doesn't need to be accurate. Because TPE assumes that the samples (the measured elo) contain noises. I measured elo with only 48 games.

Another point is that 2-6 can be done in parallel. Hyperopt supports parallel search using MongoDB or other DB. We could be able to distribute tasks between fishtest nodes.

There is one concern. If we measure elo with STC, the engine will get weak in LTC. This happened in computer shogi engine. We should use LTC to measure elo.

gonzalezjo commented 4 years ago

It’s what is used by Leela developers with good success. It generates very pretty diagrams.

unaiic commented 4 years ago

@gonzalezjo Good one. I think we could also try it, compare it with the others and we'll see how it goes...

unaiic commented 4 years ago

@gonzalezjo I took a look at it and it seems that it tunes UCI options. We should then convert the parameters (from search) to UCI options I guess, right? Then take the resulting values from the tune and test them to see if they're good enough. Please correct me if I'm wrong :)

gonzalezjo commented 4 years ago

@gonzalezjo I took a look at it and it seems that it tunes UCI options. We should then convert the parameters (from search) to UCI options I guess, right? Then take the resulting values from the tune and test them to see if they're good enough. Please correct me if I'm wrong :)

Correct. That said, it’s not as bad as it sounds! The “TUNE” macro used for SPSA tuning does that for you, so the code is technically already written.

unaiic commented 4 years ago

@gonzalezjo I tried a simple tune, just for test, but didn't work for me (white always disconnects). I tuned the first engine like this:

T1,510,460,560,5,0.0020 T2,223,210,236,1.3,0.0020

I have two executables (with the names engine1 (tuned) and engine2, both compiled with 'make build ARCH=x86-64-modern'), the network file and an opening book in the same folder. I finally have the simple_tune.json file. It looks like this:

 {
       "engines": [
           {
               "command": "engine1",
               "fixed_parameters": {
                   "Threads": 1
               }
           },
           {
               "command": "engine2",
               "fixed_parameters": {
                   "Threads": 1
               }
           }
       ],
       "parameter_ranges": {
           "T1": "Real(460, 560)",
           "T2": "Real(210, 236)"
       },
       "engine1_tc": "60+0.6",
       "engine2_tc": "60+0.6",
       "opening_file": "noob_3moves.pgn"
   }

I think this is okay, but when I run 'tune local -c simple_tune.json' I get the following 'out.pgn':

[Event "?"] [Site "?"] [Date "2020.08.23"] [Round "1"] [White "engine1"] [Black "engine2"] [Result "0-1"] [FEN "rn1qkbnr/ppp1p1pp/4b3/1P1p1p2/8/B7/P1PPPPPP/RN1QKBNR w KQkq - 0 1"] [PlyCount "0"] [SetUp "1"] [Termination "abandoned"] [TimeControl "60+0.6"]

{White disconnects} 0-1

[Event "?"] [Site "?"] [Date "2020.08.23"] [Round "1"] [White "engine2"] [Black "engine1"] [Result "0-1"] [FEN "rn1qkbnr/ppp1p1pp/4b3/1P1p1p2/8/B7/P1PPPPPP/RN1QKBNR w KQkq - 0 1"] [PlyCount "0"] [SetUp "1"] [Termination "abandoned"] [TimeControl "60+0.6"]

{White disconnects} 0-1

And so on...

What is wrong with this? I think it may be something related to the TUNE macro, but I'm not sure. Thanks :)

kiudee commented 4 years ago

@unaiic I found your issue here by accident and would like to help. I just recently converted my collection of scripts into a proper library and there might still be rough edges. Don’t hesitate to report bugs to the Issue Tracker. For the next release I am planning to allow more verbose output of cutechess-cli to be forwarded.

What happens if you run the following from the same directory?

cutechess-cli -engine conf=engine1 tc=1 -engine conf=engine2 tc=1 -debug

unaiic commented 4 years ago

@kiudee Fine, I'll report it. BTW, this is the output I get:

Warning: Unknown engine configuration: "engine1" Warning: Invalid value for option "-engine": "conf=engine1 tc=1"

kiudee commented 4 years ago

Hm, is there a engines.json file in the folder? That should be created automatically by the chess-tuning-tools.

unaiic commented 4 years ago

Yeah, my fault (I deleted them when I saw it didn't work). This is the actual output:

7 >engine1(0): setoption name Threads value 1
8 >engine1(0): setoption name T1 value 523.6574654907396
8 >engine1(0): setoption name T2 value 221.84373184658273
8 >engine1(0): uci
8 >engine2(1): setoption name Threads value 1
8 >engine2(1): uci
9 <engine1(0): Stockfish 220820 by the Stockfish developers (see AUTHORS file)
9 <engine1(0): T1,510,460,560,5,0.0020
9 <engine1(0): T2,223,210,236,1.3,0.0020
9 <engine2(1): Stockfish 220820 by the Stockfish developers (see AUTHORS file)
152 <engine2(1): id name Stockfish 220820
152 <engine2(1): id author the Stockfish developers (see AUTHORS file)
152 <engine2(1): option name Debug Log File type string default 
152 <engine2(1): option name Contempt type spin default 24 min -100 max 100
152 <engine2(1): option name Analysis Contempt type combo default Both var Off var White var Black var Both
152 <engine2(1): option name Threads type spin default 1 min 1 max 512
152 <engine2(1): option name Hash type spin default 16 min 1 max 33554432
152 <engine2(1): option name Clear Hash type button
153 <engine2(1): option name Ponder type check default false
153 <engine2(1): option name MultiPV type spin default 1 min 1 max 500
153 <engine2(1): option name Skill Level type spin default 20 min 0 max 20
153 <engine2(1): option name Move Overhead type spin default 10 min 0 max 5000
153 <engine2(1): option name Slow Mover type spin default 100 min 10 max 1000
153 <engine2(1): option name nodestime type spin default 0 min 0 max 10000
153 <engine2(1): option name UCI_Chess960 type check default false
153 <engine2(1): option name UCI_AnalyseMode type check default false
153 <engine2(1): option name UCI_LimitStrength type check default false
153 <engine2(1): option name UCI_Elo type spin default 1350 min 1350 max 2850
153 <engine2(1): option name UCI_ShowWDL type check default false
153 <engine2(1): option name SyzygyPath type string default <empty>
153 <engine2(1): option name SyzygyProbeDepth type spin default 1 min 1 max 100
153 <engine2(1): option name Syzygy50MoveRule type check default true
153 <engine2(1): option name SyzygyProbeLimit type spin default 7 min 0 max 7
153 <engine2(1): option name Use NNUE type check default true
153 <engine2(1): option name EvalFile type string default nn-82215d0fd0df.nnue
153 <engine2(1): uciok
153 >engine2(1): isready
153 <engine2(1): readyok
153 <engine1(0): id name Stockfish 220820
153 <engine1(0): id author the Stockfish developers (see AUTHORS file)
153 <engine1(0): option name Debug Log File type string default 
153 <engine1(0): option name Contempt type spin default 24 min -100 max 100
153 <engine1(0): option name Analysis Contempt type combo default Both var Off var White var Black var Both
153 <engine1(0): option name Threads type spin default 1 min 1 max 512
153 <engine1(0): option name Hash type spin default 16 min 1 max 33554432
153 <engine1(0): option name Clear Hash type button
153 <engine1(0): option name Ponder type check default false
153 <engine1(0): option name MultiPV type spin default 1 min 1 max 500
153 <engine1(0): option name Skill Level type spin default 20 min 0 max 20
153 <engine1(0): option name Move Overhead type spin default 10 min 0 max 5000
153 <engine1(0): option name Slow Mover type spin default 100 min 10 max 1000
153 <engine1(0): option name nodestime type spin default 0 min 0 max 10000
153 <engine1(0): option name UCI_Chess960 type check default false
153 <engine1(0): option name UCI_AnalyseMode type check default false
153 <engine1(0): option name UCI_LimitStrength type check default false
153 <engine1(0): option name UCI_Elo type spin default 1350 min 1350 max 2850
153 <engine1(0): option name UCI_ShowWDL type check default false
154 <engine1(0): option name SyzygyPath type string default <empty>
154 <engine1(0): option name SyzygyProbeDepth type spin default 1 min 1 max 100
154 <engine1(0): option name Syzygy50MoveRule type check default true
154 <engine1(0): option name SyzygyProbeLimit type spin default 7 min 0 max 7
154 <engine1(0): option name Use NNUE type check default true
154 <engine1(0): option name EvalFile type string default nn-82215d0fd0df.nnue
154 <engine1(0): option name T1 type spin default 510 min 460 max 560
154 <engine1(0): option name T2 type spin default 223 min 210 max 236
154 <engine1(0): uciok
154 >engine1(0): isready
154 <engine1(0): readyok
Started game 1 of 1 (engine1 vs engine2)
154 >engine1(0): ucinewgame
154 >engine1(0): setoption name Ponder value false
154 >engine1(0): position startpos
154 >engine2(1): ucinewgame
154 >engine2(1): setoption name Ponder value false
154 >engine2(1): position startpos
154 >engine1(0): isready
156 <engine1(0): readyok
156 >engine1(0): go wtime 1000 btime 1000
157 <engine1(0): info string ERROR: NNUE evaluation used, but the network file nn-82215d0fd0df.nnue was not loaded successfully.
157 <engine1(0): info string ERROR: The UCI option EvalFile might need to specify the full path, including the directory/folder name, to the file.
157 <engine1(0): info string ERROR: The default net can be downloaded from: https://tests.stockfishchess.org/api/nn/nn-82215d0fd0df.nnue
157 <engine1(0): info string ERROR: If the UCI option Use NNUE is set to true, network evaluation parameters compatible with the program must be available.
157 <engine1(0): info string ERROR: The engine will be terminated now.
Terminating process of engine engine1(0)
159 >engine2(1): isready
159 <engine2(1): readyok
Elo difference: -inf +/- nan
Finished match
Finished game 1 (engine1 vs engine2): 0-1 {White disconnects}
Score of engine1 vs engine2: 0 - 1 - 0  [0.000] 1
Elo difference: -inf +/- nan
Finished match
159 >engine2(1): quit

kiudee commented 4 years ago

Ok, there are a few things you can try.

Add "EvalFile": "nn-9931db908a9b.nnue" or the full absolute path to your simple_tune.json file.
Add "Use NNUE": "false" instead.

edit: I could try to make the library a bit more robust by allowing the user to set the working directory.

unaiic commented 4 years ago

Okay, now it seems to be playing games. I understand that this behaviour (from log.txt) is normal?

2020-08-23 16:47:44,956 INFO Starting iteration 0 2020-08-23 16:47:44,956 INFO Testing {'T1': 474.6329987400783, 'T2': 225.4754269784815} 2020-08-23 16:47:44,956 INFO Start experiment 2020-08-23 16:50:15,376 INFO Experiment finished (150.419198s elapsed). 2020-08-23 16:50:16,006 INFO Got score: -0.4660222762857485 +- 0.26871653930371303 2020-08-23 16:50:16,007 INFO Updating model 2020-08-23 16:50:16,007 INFO GP sampling finished (0.000237s) 2020-08-23 16:50:16,105 INFO Starting iteration 1 2020-08-23 16:50:16,106 INFO Testing {'T1': 499.14523211540876, 'T2': 210.65957941253217} 2020-08-23 16:50:16,107 INFO Start experiment

kiudee commented 4 years ago

Yes, that is the expected output.

unaiic commented 4 years ago

@kiudee Okay, thank you so much. I guess there is no need to report any issue then (it only needed the path to the net to be specified). We'll continue to test things out and see how it goes :)

unaiic commented 4 years ago

@nodchip @gonzalezjo I suppose we should also consider this option and see its results. It's a simple option with an easy setup to work with. IMHO this is a good enough reason to test this out. Of course we also have @nodchip's implementation to work on and adapt it to SF. What are your thoughts on this?

nodchip commented 4 years ago

How about to compare three methods, SPSA, Hyperopt and chess-tuning-tools. The point will be computer resources vs elo improvements.

unaiic commented 4 years ago

@nodchip True. Computer resources are limited. The simple tune I started ~45mins ago has just started the 14th iteration (and I changed it to be TC 5'+0.05s), which shows that it's rather difficult to be able to compare them without external help.

gonzalezjo commented 4 years ago

I would be interested in hearing @kiudee’s thoughts on his approach vs. Hyperopt and vs. SPSA, if he has the time and interest to share.

kiudee commented 4 years ago

@gonzalezjo I can write down a few general considerations which hold without having done extensive experiments of all three approaches.

The biggest problem we have in optimizing several parameters at the same time is the curse of dimensionality. It is clear that the amount of points we have to collect blows up exponentially in the number of (effective) parameters (we basically have to build a "wall" for the optimum). The noise resulting from chess games inflates the number of iterations in addition by a constant factor.

Now there are different ways to deal with this problem. With very strong regularity assumptions, you are able to optimize many parameters at the same time, but if those assumptions are violated you might only be able to find a local optimum (e.g. SPSA). Hyperopt and chess-tuning-tools use asymptotically non-parametric methods. That means that in principle if you run them for long enough and keep exploring, they will model the optimization landscape perfectly and be able to find the true global optimum. Hyperopt uses a tree of parzen estimators as the model which has problems accurately extrapolating the function and the uncertainty to unknown regions. chess-tuning-tools uses the Gaussian process implementation of bayes-skopt, which allows it to accurately quantify the uncertainty in each point. In addition we can use the properties of the Gaussian process to employ advanced optimization algorithms such as max-value entropy search (default in chess-tuning-tools) and predictive variance reduction search.

What are the downsides? Since Hyperopt and chess-tuning-tools use a very flexible model, they eventually also suffer from the curse of dimensionality. If you want to optimize more than say 10 parameters, it will be difficult to model the target function in few iterations (< 2000). For Gaussian processes there are methods which are able to fit a lower dimensional subspace, allowing them to generalize with less number of iterations, but they are not yet implemented. Another consideration is running time: The Gaussian process implementation is optimized to be accurate with few iterations, but its complexity is cubic in the number of iterations. Thus if you collect over 2000 iterations the overhead per iteration could be several minutes. That’s why I would recommend to increase the number of games per iteration to make each datapoint less noisy and to scale the approach that way.

What would be my recommendation?

For large numbers of parameters (>20): I don’t expect any algorithm to work well. I think it might even be hard to beat random search. SPSA could already work adequately.
For small numbers of parameters (<7): I would definitely use something like chess-tuning-tools here, since that’s what it’s optimized for. A compromise could also be to start with SPSA to optimize a large number of parameters and then switch to chess-tuning-tools for finetuning.
For intermediate numbers of parameters (≥7, ≤20): I really don’t know. More experimentation is needed.

unaiic commented 4 years ago

Could we say then that there are two groups: 1) SPSA and 2) Hyperopt and chess-tuning-tools? If both methods from the second group behaved similarly and if Hyperopt didn't add any irrelevant advantages (@nodchip might have more insights on this), we could try to implement chess-tuning-tools (as it is by far the easiest one) and compare it to SPSA. And taking into account what @kiudee said, we could even try to mix SPSA and chess-tuning-tools for big tunes and see how it goes. IMO we should try to implement it right on fishtest; otherwise we won't have enough resources to test this things out.

gonzalezjo commented 4 years ago

Thanks a lot for the explanation, kiudee!

kiudee commented 4 years ago

Also let me know if you need any specific functionality implemented.

vondele commented 4 years ago

BTW, let me plug an advertisement for a framework I wrote a while ago, which allows for methods in the nevergrad suite of optimization methods to be used. Right now hardwired to TBPSA

https://github.com/vondele/nevergrad4sf

I can't say it was hugely successful, the resources needed to fine tune parameters in SF is just very large.

unaiic commented 4 years ago

@kiudee @nodchip What are your thoughts on this option (compared to the others)?

nodchip commented 4 years ago

I think that the best way is to try each method one by one, and compare the results.

kiudee commented 4 years ago

I agree with nodchip. There is no data yet to definitively favor one method over another.

unaiic commented 4 years ago

Okay, then I guess we should implement them into fishtest; otherwise we won't have the required resources to test them.

kiudee commented 4 years ago

I want to add that usually people use chess-tuning-tools with 10-100 games per iteration. If you distribute each tested configuration on fishtest you can get a very accurate score for the optimizer and converge much faster. I would use slightly different settings than the default settings in that case.

ppigazzini commented 4 years ago

Okay, then I guess we should implement them into fishtest; otherwise we won't have the required resources to test them.

More realistic is to find some easy tunable parameters, set the starting points a bit far from the optima, and make a comparative test at very short time control (e.g. 1+0.01 or even 0.5+0.005). With SPSA worked fine.

unaiic commented 4 years ago

@ppigazzini Search parameters however don't fall within that definition (most times), as they're very TC sensitive. We could of course try it (also with classical eval terms; they may be more TC independent). Anyway, that's a good starting point :)

gonzalezjo commented 4 years ago

@AndyGrant just released an incredibly cool, relevant, and readable document: https://pdfhost.io/v/Q~Yrnl8kj_Tuningpdf.pdf (From http://talkchess.com/forum3/viewtopic.php?p=857371)

Given a dataset, it can tune all of Ethereal's evaluation parameters at once, including the non-linear ones, on pretty much any hardware configuration. In the time it takes to complete 1-2 iterations of kiudee's tuner, it can completely tune and achieve convergence on all of Ethereal's evaluation parameters.

There are a bunch of reasons this is cool, but my favorite of them is that anyone adding anything to classical eval can tune the entire eval for their feature. Should make it way easier to get an eval change to pass.

Another tuner is needed for search, though.

ppigazzini commented 4 years ago

@unaiic "parameters very TC sensitive" could be good candidates to be "easy tunable parameters" in the scope of a comparative test, ie with starting values a bit off the optima (TC dependent) the parameters will converge to the optima with all the optimization algorithms. In this way we could compare the algorithms wrt some performance indicators:

time to convergence
accuracy and precision

I don't know the Stockfish code, so I can't help with the choice of the parameters, unfortunately.

unaiic commented 4 years ago

@ppigazzini Okay. 1+0.01 seems a good TC to test then. We should specify the number of games/iterations and all the other conditions. Testing SPSA is easy with fishtest, and testing chess-tuning-tools could be done locally at that TC. The other two options, @nodchip's and @vondele's, might require more time to implement, although the latter one may be ready, IDK.

joergoster commented 4 years ago

Just stumbled over this very interesting discussion accidentally.

Just in case you don't already know, @fsmosca implemented Optuna here https://github.com/fsmosca/Optuna-Game-Parameter-Tuner

However, all these methods have one big disadvantage, they don't deal with the very noisy evaluations if we pass game results. Although nowhere explicitly stated, to my knowledge, they rely on deterministic results of the evaluated objective function.

One way to deal with this could be to return the averaged values of the x best evaluated points instead of simply returning the single best one. This looks like a very crude way, though ...

Another interesting way I found here https://facebookresearch.github.io/nevergrad/optimizers_ref.html#nevergrad.optimization.optimizerlib.ParametrizedOnePlusOne Allowing to re-evaluate the best point so far regularly, seems like a very interesting and promising way to me ...

kiudee commented 4 years ago

@joergoster both Optuna and chess-tuning-tools explicitly model the noise of the objective function (it was even the main motivation why I forked off bayes-skopt to begin with). Could you clarify what you mean by that?

edit (To give more context): In chess-tuning-tools the match results are used in a Dirichlet-Pentanomial model to estimate the expected Elo score and also the error of that estimate. That error is given to the model, which in addition will estimate the residual noise using a Gaussian distribution.

joergoster commented 4 years ago

@kiudee A match result can widely vary as we all know. Even when playing hundreds of games per match. Yet all optimizers I know simply return the best evaluated point which simply didn't really work for me.

Return a solution: either the point evaluated with the largest f (x), or the point with the largest posterior mean.

which you can find on top of page 3 in this paper: https://arxiv.org/abs/1807.02811#

Note 1, I didn't try your chess-tuning-tools by now. Note 2, I'm only a layman, yet I think I have some experience as I tried many different tuning methods over the last years.

kiudee commented 4 years ago

Ah okay, I see what you mean. Historically, Bayesian optimization was used mainly for computer experiments, where it is possible to set the random seeds such that each experiment is deterministic. In such a setting it makes sense to return the point which received the best score so far. I totally agree that this does not make any sense for chess (well, unless you plan to run thousands of games per iteration).

I don’t exactly know what the tuning tool based on Optuna is doing, but chess-tuning-tools is returning the global optimum of the mean Gaussian process. In addition, for the output plots it is also showing the "pessimistic optimum" which takes the current uncertainty into account (see here for more details).

joergoster commented 4 years ago

@kiudee Thank you, very interesting. So chess-tuning-tools is next on my list now. ;-)

fsmosca commented 4 years ago

I don’t exactly know what the tuning tool based on Optuna is doing, but chess-tuning-tools is returning the global optimum of the mean Gaussian process. In addition, for the output plots it is also showing the "pessimistic optimum" which takes the current uncertainty into account (see here for more details).

@kiudee

The default surrogate model or sampler in optuna is TPE. See optimize.

Other samplers including the grid and random.

The TPE sampler has some interesting parameters and methods too. The gaussian prior is enabled by default.

kiudee commented 4 years ago

@fsmosca do you know how Optuna selects the final point? Is it the best one tried so far or the optimum of the surrogate model.

fsmosca commented 4 years ago

It looks like the param returned is the one that performed best in the trial.

kiudee commented 4 years ago

Ok, then you should be very careful with that. The point which performed best in our setting could just be a very lucky point (especially the fewer games you run per point). Another problem is that the longer you run the algorithm, more points will share the same match result. Maybe you can find a way to extract the best predicted point from the model.

joergoster commented 4 years ago

As you can see, this already happens with only 200 trials and 160 games per trial, as reported by me here.

fsmosca commented 4 years ago

There are actually 2 methods of determining the best param and best value currently implemented on my tuner.

The setup is test engine vs base engine. At trial 0 the base engine will use the default param and test engine will always take the param from the optimizer.

Whenever the test engine defeats the base engine, the param of optimizer used by test engine will become the best param. This best param will be used by base engine in the next trial and the test engine will take new param from the optimizer. I revised the actual result of the match before sending it to the optimizer as the base engine is changing its param. Few discussions at optuna.
I put up a way of guarding against a lucky win by optimizer param. If the test engine defeats the base engine by only 0.54 or 54%, I don't update the best param and use it with base engine. What I need is for the test engine to defeat the base engine by a score of 0.6 (this is settable) or more and with that the best param will be updated and will be used by the base engine in the next trial. With this in place one can also set a lower games per trial.
Whenever the test engine defeats the base engine, update the best param, but the base engine will always use the initial or default param and test engine will use new param from optimizer. The param that performs well against the base engine will be the best param.

Looks like method 2 needs a higher games per trial than method 1 as the base engine is constantly using the default param. Method 1 is dynamic, the param that wins might be lucky on certain trial, but it might get corrected in the next trial.

We all know that these optimizations need more tuning games to be more reliable.

joergoster commented 4 years ago

It looks like there is no easy way to re-evaluate points with Optuna. Too bad.

@vondele Tried to get nevergrad4sf to work under Windows and failed. Way too complicated, imho.

@kiudee Trying chess-tuning-tools next.

vondele commented 4 years ago

@joergoster I assume you refer the to mpi aspect of it? That's unfortunate, but the best I could come up with to allow distributed optimization (like fishtest, but without writing a new fishtest).

The real problem with tuning (at least stockfish level engines) is that one needs 100000s of games, no matter how smart the optimizer is, and these games need to be at the relevant TC. The Elo differences are usually just too small to measure otherwise.

joergoster commented 4 years ago

@vondele I do understand this, yet the average user doesn't do distributed optimization but simply local on one computer. Maybe you find some time to strip off the mpi stuff? I would also like to define the tuneable parameters instead of relying them to be read from the executable.

kiudee commented 4 years ago

@joergoster chess-tuning-tools does not support manual testing of points yet. This is on my to do list though.

vondele commented 4 years ago

untested, this might work (replace MPIPoolExecutor by ProcessPoolExecutor):

diff --git a/nevergrad4sf.py b/nevergrad4sf.py
index b5e6813..5bbefa7 100644
--- a/nevergrad4sf.py
+++ b/nevergrad4sf.py
@@ -19,9 +19,8 @@ import textwrap
 import nevergrad as ng
 from subprocess import Popen, PIPE
 from cutechess_batches import CutechessExecutorBatch, calc_stats
-from mpi4py import MPI
-from mpi4py.futures import MPIPoolExecutor
 from concurrent.futures import ThreadPoolExecutor
+from concurrent.futures import ProcessPoolExecutor

 def get_sf_parameters(stockfish_exe):
@@ -76,18 +75,6 @@ def ng4sf(
       games per batch, cutechess concurrency, and evaluation batch concurrency
     """

-    # ready to run with mpi
-    size = MPI.COMM_WORLD.Get_size()
-    print()
-    if size > 1:
-        print(
-            "Launched ... with %d mpi ranks (1 master, %d workers)." % (size, size - 1)
-        )
-        print(flush=True)
-    else:
-        sys.stderr.write("ng4sf needs to run under mpi with at least 2 MPI ranks.\n")
-        sys.exit(1)
-
     # print summary
     print("stockfish binary                          : ", stockfish)
     print("stockfish reference binary                : ", stockfishRef)
@@ -128,7 +115,7 @@ def ng4sf(
         rounds=((games_per_batch + 1) // 2 + mpi_subbatches - 1) // mpi_subbatches,
         concurrency=cutechess_concurrency,
         batches=mpi_subbatches,
-        executor=MPIPoolExecutor(),
+        executor=ProcessPoolExecutor(),
     )
     restartFileName = "ng_restart.pkl"

joergoster commented 4 years ago

@kiudee Got it up and running! Reading the configuration from a json file is a very nice solution.

Here is the result of a very quick 1st testrun (5 rounds only).

2020-09-26 17:00:56,535 INFO     Saving a plot to plots\20200926-170032-50.png.
2020-09-26 17:00:56,544 INFO     Testing {'KingAttackWeights[2]': 66, 'KingAttackWeights[3]': 200, 'KingAttackWeights[4]': 14, 'KingAttackWeights[5]': 9}
2020-09-26 17:00:56,545 INFO     Start experiment
2020-09-26 17:01:17,978 INFO     Experiment finished (21.431533s elapsed).
2020-09-26 17:01:18,420 INFO     Got score: 0.7043650362227257 +- 0.7975918092189588
2020-09-26 17:01:18,420 INFO     Updating model
2020-09-26 17:02:26,737 INFO     GP sampling finished (68.315932s)
2020-09-26 17:02:26,742 INFO     Starting iteration 51
2020-09-26 17:02:35,888 INFO     Current optimum:
{'KingAttackWeights[2]': 14, 'KingAttackWeights[3]': 8, 'KingAttackWeights[4]': 0, 'KingAttackWeights[5]': 8}
2020-09-26 17:02:35,889 INFO     Estimated value: -0.914 +- 0.6213
2020-09-26 17:02:35,889 INFO     90.0% confidence interval of the value: (-1.9359, 0.108)
2020-09-26 17:02:36,055 INFO     90.0% confidence intervals of the parameters:
Parameter             Lower bound  Upper bound
----------------------------------------------
KingAttackWeights[2]            2          128
KingAttackWeights[3]            3          189
KingAttackWeights[4]            0          113
KingAttackWeights[5]            0          120

official-stockfish / fishtest

Stochastic optimization for parameters #774