official-stockfish / fishtest

The Stockfish testing framework
https://tests.stockfishchess.org/tests
270 stars 126 forks source link

SPSA improvements [RFC] #535

Open ppigazzini opened 4 years ago

ppigazzini commented 4 years ago

Issue opened to collect info about possible future SPSA improvements.

SPSA references

SPSA is a fairly simple algorithm to be used for local optimization (not global optimization). The wiki has now a simple documentation to explain the SPSA implementation in fishtest Here is other documentation:

SPSA implementation problems/improvements

SPSA testing process (aka Time Control)


EDIT_000 this paragraph is outdated, I kept it to avoid disrupting the chain of posts:

I suggest this process to optimize the developer time and the framework CPU.

I took a SPSA from fishtest and run it locally changing only the the TC, the results are similar:

20+02

2+002

1+001

05+001

MJZ1977 commented 4 years ago

From my experience on SPSA, the main problem is the high level of noise in the results. If any proposition reduce this noise, I agree with it :-) You said :

"one iteration should be set to a 2 games for match, but our worker code cannot support this, so we set one iteration to a 2*N cores gamer for match"

Can we choose them number N ? Increase it specially. I think that below 100 games, the result can be completely wrong and lead to a bad convergence.

ppigazzini commented 4 years ago

@MJZ1977 the companion code of the seminal paper asks for the number of averaged SP gradients to be used per iteration. List updated, thank you :)

ppigazzini commented 4 years ago

The experimental options "careful clipping" and "randomized rounding" don't seems to have a first order effect, so we could keep only one method to clip and to round.

c

r

cr

MJZ1977 commented 4 years ago

@ppigazzini : what are the effects of these options? did they change the number N of games before updating parameters ?

ppigazzini commented 4 years ago

@MJZ1977 "careful clipping" https://github.com/glinscott/fishtest/commit/7eebda7e6d1f47f2672aefe46db35baee7cb5b1f and randomized rounding https://github.com/glinscott/fishtest/commit/5f63500db3f40569ea406a8b8b4b987f054ee79f are theoretical improvements with little/no effect on SPSA convergence wrt other parameters. People stuck to default, so the GUI was simplified dropping the possibility to chose them. I will do some other tests and then I will simplify the code dropping the options not useful.

https://github.com/glinscott/fishtest/blob/5b07986dab3e638292cd04d6cf95d89d9959faeb/fishtest/fishtest/rundb.py#L599-L625

linrock commented 4 years ago

From what i'm finding online, alpha is usually 0.602, gamma at 0.101 is ok, and A is ~ 10% the number of iterations. would these be good defaults for the SPSA fields?

Sources: https://hackage.haskell.org/package/spsa-0.2.0.0/docs/Math-Optimization-SPSA.html https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4769712/ https://www.chessprogramming.org/SPSA https://www.jhuapl.edu/SPSA/PDF-SPSA/Spall_Implementation_of_the_Simultaneous.PDF

vondele commented 4 years ago

@linrock it makes definitely sense to have defaults for the fields (actually, I was thinking they had defaults...). Also @ppigazzini suggests to have A depend on the number of games. Shouldn't we call the field 'A[in %]' and give it a default of 10%, so that the field doesn't need to be adjusted when the number of games is changed ?

linrock commented 4 years ago

ah yea, i removed the SPSA defaults in the "create new test" redesign PR when all that should've been removed was the list of hard-coded params in the SPSA parameter list.

A as a percentage of # games makes sense. from what i'm reading, A is typically less than or equal to 10% the expected # of iterations (2 games per iteration). So maybe it could be either:

xoto10 commented 4 years ago

Haha, in all this time I never realised that A was (/ should be) related to the number of games! :)

Regarding SPSA at very low tc, does that stress the server a lot because workers are continually returning small batches of data?

ppigazzini commented 4 years ago

@xoto10 the SPSA at very low tc can be also done locally :)

vondele commented 4 years ago

@linrock either percentage seems fine to me. Probably games, since we specify #games for SPSA and not number of iterations. In the future, I could imagine that an iteration contains more than 2 games (i.e. batching for SPSA, @vdbergh?), to reduce server load, and because it presumably makes sense (but I don't know the SPSA details).

vdbergh commented 4 years ago

@vondele I am working on a small PR to allow the server to set a batch_size. It is mainly for sprt but it will also work for spsa and fixed games although for those one may consider leaving it to the worker. We can see.

MJZ1977 commented 4 years ago

@ppigazzini : I am trying to understand how SPSA code is working and my knowledge is very weak. Nevermind, I am trying. In the file rundb.py, I find the following:

    # Generate the next set of tuning parameters
    iter_local = spsa['iter'] + 1  # assume at least one completed,
                                   # and avoid division by zero
    for param in spsa['params']:
      c = param['c'] / iter_local ** spsa['gamma']
      flip = 1 if random.getrandbits(1) else -1
      result['w_params'].append({
        'name': param['name'],
        'value': self.spsa_param_clip_round(param, c * flip,
                                            spsa['clipping'], spsa['rounding']),
        'R': param['a'] / (spsa['A'] + iter_local) ** spsa['alpha'] / c ** 2,
        'c': c,
        'flip': flip,
      })
      result['b_params'].append({
        'name': param['name'],
        'value': self.spsa_param_clip_round(param, -c * flip, spsa['clipping'], spsa['rounding']),
      })
    # Update the current theta based on the results from the worker
    # Worker wins/losses are always in terms of w_params
    result = spsa_results['wins'] - spsa_results['losses']
    summary = []
    w_params = self.get_params(run['_id'], worker)
    for idx, param in enumerate(spsa['params']):
      R = w_params[idx]['R']
      c = w_params[idx]['c']
      flip = w_params[idx]['flip']
      param['theta'] = self.spsa_param_clip_round(param, R * c * result * flip,
                                                  spsa['clipping'],
                                                  'deterministic')
      if grow_summary:
        summary.append({
          'theta': param['theta'],
          'R': R,
          'c': c,
        })

My questions are:

And sorry for these "technical questions" ...

Update : latest version of code

vondele commented 4 years ago

@MJZ1977 I think it is great somebody is looking at the implementation of SPSA. I'm still puzzled why our tuning attempts have such a low success rate (@linrock recent experience). I do think we need a very large number of games, as the Elo difference we're looking for are so small, and the parameters of SPSA are not obvious or automatic, but I also think we need to critically audit the actual implementation, just in case.

tomtor commented 4 years ago

@MJZ1977 You should also look at the worker code to get the complete picture, at and below this line https://github.com/glinscott/fishtest/blob/db94846a0db8788fe8a8724678798dcc91d201e8/worker/games.py#L386

See https://github.com/zamar/spsa for the original implementation

tomtor commented 4 years ago
  • Are the results corresponding to a specified number of games for a worker

@MJZ1977 A worker plays batches of 2*N-CPU games (white/black alternating) and requests a parameter update from the server after every batch.

ppigazzini commented 4 years ago

@vondele SPSA claims to minimize the number of function evaluations. Classic SPSA evaluates the function only at "variables_values_k+delta; variables_values_k-delta" for the gradient estimation, so SPSA obviously diverges with wrong delta. This is why I suggest to test locally the SPSA parameters with USTC before submitting to fishtest.

The one side SPSA computes the gradient with "variables_values_k+delta; variables_values_k", so having a CPU cost free function evaluation with variable_value_k it's possible to implement:

Neither policies can guarantee the convergence with bad delta, though. SPSA (and all gradient descent algorithms) works only to refine the starting values within the starting basin, to find better local maxima we should switch to global optimization algorithms based on function evaluations (Nelder-Mead, genetic etc.) to explore the space variables. https://en.wikipedia.org/wiki/Global_optimization

MJZ1977 commented 4 years ago

@tomtor : thank you for the links ! Update : removed

vondele commented 4 years ago

@ppigazzini concerning Nelder-Mead, I did work on interfacing cutechess games to the nevergrad suite of optimizers : https://github.com/vondele/nevergrad4sf and picked TBPSA which seems to be the recommended optimizer for noisy functions. I found it robust if given enough games (millions literally). Unfortunately, the optimized parameters seem very good at the TC they have been optimized (VSTC), but not transferable. Since I can't optimize at STC or LTC, it would need to be integrated in fishtest.... but I'm not able to do that (time and experience with the framework lacking atm).... if somebody wants to pick it up, I would be happy to help.

MJZ1977 commented 4 years ago

After making some tests, I think that one of principal problems is that the random parameter "flip" is only taking values +1 or -1 (please correct if I am wrong). So basically, fishtest always tries to change all variables at the same time. One improvement can be to take flip values from [+1, +0.1, -0.1, -1] for exemple. It corresponds to random division by 10. In this case, we will have some tests with only 1 or 2 variables changing. I think it is also easy to implement even if I don't have the knowledge to make it !

xoto10 commented 4 years ago

If we want to tune 1 constant, it would be nice if tuning could simply test the start value and N values either side (3? 5?) and then display a bar chart of the resulting performance. That might give us an easy to read clue as to whether there's a trend in in which values are better. We tend to do this manually atm but it seems easy for the tuner to do?

ppigazzini commented 4 years ago

@MJZ1977

So basically, fishtest always tries to change all variables at the same time.

SPSA = Simultaneous perturbation stochastic approximation

One improvement can be to take flip values from [+1, +0.1, -0.1, -1] for exemple. It corresponds to random division by 10. In this case, we will have some tests with only 1 or 2 variables changing. I think it is also easy to implement even if I don't have the knowledge to make it !

Random [+1, -1] is the Rademacher distribution, you can use other distributions, but the result IMO will not change: we can get good fishtest gains from SPSA only for bad tuned parameters or when SPSA finds for serendipity a new local maximum.

SPSA, like other gradient algorithms, it'a a local optimization, useful to refine the starting values "without hopping from the starting basin".

@xoto10 you are talking about a global optimization algorithm, take a look to the @vondele work.

vondele commented 4 years ago

while the TBPSA might also work for global optimization (that's always hard), I don't think we're typically stuck in local minima. At least, I have never seen evidence of that. TBPSA seems to be just rather good of doing the right thing in the presence of noise, also in (a relatively small) number of dimensions. @xoto10 the bar chart will tell almost nothing in most cases, unless we do on the order of 240000 games per point (that's roughly 1Elo error, i.e. the typical gain from a tune).

I once did a scan of one parameter for one of the search parameter, and the graph is somewhere in a thread on github, which I can't find right now, and it looks like this: elo_stat

ppigazzini commented 4 years ago

I don't think we're typically stuck in local minima. At least, I have never seen evidence of that.

In that case (a proper implemented) SPSA should be able to find a better value, but in my first post I collected all my doubts about our SPSA implementation.

A simple proof is to set a blatant wrong value for a parameter (eg. Queen = 0.1 pawn, sorry I'm not a SF developer :) and view if our SPSA is able to recover a good value.

MJZ1977 commented 4 years ago

I made some tests since yesterday and come to the conclusion that SPSA is not working well actually because of too much noise in individual results. As an example to explain my thought, I take this simple example SPSA beginning with KnightSafeCheck = 590 https://tests.stockfishchess.org/tests/view/5ea9b5c469c5cb4e2aeb82fd SPRT master vs KnightSafeCheck = 590 https://tests.stockfishchess.org/tests/view/5eaa93b769c5cb4e2aeb8370 The best value should be KnightSafeCheck = ~790 like in master. SPSA is oscillating even if it seems increasing at the end. I use only 1 variable to avoid any bias.

The only solution to this is to make iterations for at least 200 games instead of 2N games. For example if the results are 60-40-100, it gives +20 to multiply by the same gradient. It is very different from multiplying "60" and "-40" by different gradients which clearly increases the noise. This is my opinion but I cannot be sure without making tests which are impossible now.

An improvement can be to add an SPSA parameter = minimum number of games per iteration instead of the default 2N

vdbergh commented 4 years ago

I think one cannot say anything without first measuring the Elo difference between 590 and 790. If it takes many games to just detect the difference one cannot expect spsa to magically wander from 590 to 790.

ppigazzini commented 4 years ago

@MJZ1977 try with a blatant wrong value eg KnightSafeCheck = 790000. If the SPSA is not able to recover a value that makes sense (eg 2000) then:

vondele commented 4 years ago

that might not be such a good test... this could be so far off that e.g. the local gradient is zero, so progress won't be made. Maybe it could be started from 0. But I agree it is good to have an estimate of the Elo importance of the term as well, and probably picking a term with e.g. ~10Elo impact makes sense. (Maybe scaling initiative would be a candidate?)

MJZ1977 commented 4 years ago

I launched a test beginning with a value of 10. So, you confirm that SPSA can't make +/-5 ELO difference. It will not be easy to find a +5ELO patch :-).

ppigazzini commented 4 years ago

that might not be such a good test... this could be so far off that e.g. the local gradient is zero, so progress won't be made.

@vondele you know exactly how much you are off, so setting the proper "c_k_end" the gradient should not be 0 (if that parameter has an effect at all). And here is (one of many) a problem of our SPSA implementation: we ask for "c_k_end" and not for "c_k_start" making very hard for developers to control the SPSA behaviour.

MJZ1977 commented 4 years ago

I give up with KnightSafeCheck tests because the sensitivity to this parameter is not obvious. I took another parameter to do tests : the weak squares multiplier in king danger formula which I will call "Coef1". Master value is 185, an I begin with a low value of 20. SPRT finish quickly with 850 wins (18.4%), 1064 losses (23%). ELO broad estimate -15 ELO. https://tests.stockfishchess.org/tests/view/5eaafe2209d25e8e5058167e SPSA with defaults values, 60k games: Coef 1 = 30 https://tests.stockfishchess.org/tests/view/5eab049c09d25e8e505816a0 SPSA with higher "c" and "a", 60k games: Coef 1 = 40 https://tests.stockfishchess.org/tests/view/5eab049c09d25e8e505816a0 SPSA with higher "c" and "a" and closer initial value 140 stalled quickly ?! https://tests.stockfishchess.org/tests/view/5eab146409d25e8e505816fe

So, as a first conclusion, it seems that SPSA is going to the good direction but convergence is slow and not linear (many oscillations). I don't think there is a major problem with our SPSA implementation but I hope we can improve it to have a finer tuned parameters. I will repeat my improvement suggestions (which is the topic of this thread): 1- Increase the size of game batches to at least ~200k games: will decrease the numeric random dispersion, 2- Take another distribution than [+1, -1] (Rademacher distribution), [+1, +0.1, -0.1, -1] can be a good candidate: it will mainly improve separating the parameters sensibility and have a finer multi-parameters tuning.

Nobody can say how much it can improve SPSA without testing them on chess games. I hope this helps :-)

vdbergh commented 4 years ago

I have looked a bit at the spsa implementation in fishtest and I see no obvious problems with it. Of course the devil is sometimes in the details.

In first approximation one can use simulation to study spsa behaviour. One can even extract the spsa code from Fishtest to make sure one runs exactly the same code (in particular the batching behavior).

The difficulty is to have a realistic loss function. One could start with the one provided by @vondele https://github.com/glinscott/fishtest/issues/535#issuecomment-621363937.

vdbergh commented 4 years ago

Some time ago I started thinking about SPSA and wrote this document

support_ornstein_uhlenbeck.pdf

My initial hope was to get information on good choices of hyper parameters but nothing obvious came out. So the above document is incomplete and in addition it says nothing about batching.

vondele commented 4 years ago

BTW, I agree with doing simulation to study spsa behavior. I'll report later today some up to date results for the coef1 variable by @MJZ1977 what Elo estimates are for various values, so that we can in principle make an accurate loss function. I guess that actually needs @vdbergh input, to do that correctly (i.e. so that the noise is realistic as well). I'll also use it as test for the nevergrad4sf so we can compare.

ppigazzini commented 4 years ago

@vdbergh take a look at my corrections here: https://github.com/glinscott/fishtest/compare/master...ppigazzini:spsa_fix_clean_up

  1. get rid of R=a/c**2 that make very difficult to set a starting value. Seminal SPSA paper suggest to use a=0.16 and this is working fine in my tests (I'm using a=0.2)
  2. our implementation is using function_value=games_result/2 (?) but I'm using function_value=games_result*2*c: in this way for the same wins-losses difference the gradient is independent by c (as a derivative IMO should be)
vondele commented 4 years ago

So, if we want to construct a loss function for experimenting. The coef1 of above, measured with 10k STC games per point gives me this:

   evaluated:  {'coef1': 7}    score    :   47.515 +-    0.642    Elo      :  -17.285 +-    4.475 
   evaluated:  {'coef1': 11}    score    :   47.607 +-    0.644    Elo      :  -16.642 +-    4.484 
   evaluated:  {'coef1': 14}    score    :   47.437 +-    0.638    Elo      :  -17.826 +-    4.447 
   evaluated:  {'coef1': 20}    score    :   48.233 +-    0.639    Elo      :  -12.283 +-    4.449 
   evaluated:  {'coef1': 25}    score    :   48.252 +-    0.640    Elo      :  -12.148 +-    4.455 
   evaluated:  {'coef1': 28}    score    :   48.932 +-    0.640    Elo      :   -7.422 +-    4.452 
   evaluated:  {'coef1': 33}    score    :   48.306 +-    0.641    Elo      :  -11.777 +-    4.460 
   evaluated:  {'coef1': 38}    score    :   48.718 +-    0.635    Elo      :   -8.907 +-    4.418 
   evaluated:  {'coef1': 43}    score    :   48.748 +-    0.633    Elo      :   -8.705 +-    4.401 
   evaluated:  {'coef1': 46}    score    :   48.155 +-    0.637    Elo      :  -12.824 +-    4.434 
   evaluated:  {'coef1': 46}    score    :   48.820 +-    0.630    Elo      :   -8.198 +-    4.383 
   evaluated:  {'coef1': 47}    score    :   48.621 +-    0.631    Elo      :   -9.582 +-    4.388 
   evaluated:  {'coef1': 51}    score    :   49.214 +-    0.635    Elo      :   -5.465 +-    4.413 
   evaluated:  {'coef1': 54}    score    :   49.461 +-    0.632    Elo      :   -3.744 +-    4.393 
   evaluated:  {'coef1': 56}    score    :   49.228 +-    0.632    Elo      :   -5.364 +-    4.392 
   evaluated:  {'coef1': 61}    score    :   49.049 +-    0.634    Elo      :   -6.612 +-    4.408 
   evaluated:  {'coef1': 65}    score    :   49.044 +-    0.627    Elo      :   -6.646 +-    4.360 
   evaluated:  {'coef1': 67}    score    :   48.922 +-    0.626    Elo      :   -7.490 +-    4.355 
   evaluated:  {'coef1': 83}    score    :   49.670 +-    0.622    Elo      :   -2.294 +-    4.319 
   evaluated:  {'coef1': 83}    score    :   49.922 +-    0.622    Elo      :   -0.540 +-    4.323 
   evaluated:  {'coef1': 89}    score    :   49.345 +-    0.633    Elo      :   -4.554 +-    4.398 
   evaluated:  {'coef1': 91}    score    :   48.951 +-    0.629    Elo      :   -7.287 +-    4.376 
   evaluated:  {'coef1': 92}    score    :   50.126 +-    0.623    Elo      :    0.877 +-    4.331 
   evaluated:  {'coef1': 93}    score    :   49.204 +-    0.626    Elo      :   -5.532 +-    4.353 
   evaluated:  {'coef1': 98}    score    :   49.840 +-    0.630    Elo      :   -1.113 +-    4.380 
   evaluated:  {'coef1': 102}    score    :   50.019 +-    0.627    Elo      :    0.135 +-    4.357 
   evaluated:  {'coef1': 102}    score    :   50.078 +-    0.623    Elo      :    0.540 +-    4.331 
   evaluated:  {'coef1': 103}    score    :   49.956 +-    0.628    Elo      :   -0.304 +-    4.362 
   evaluated:  {'coef1': 103}    score    :   50.189 +-    0.619    Elo      :    1.316 +-    4.304 
   evaluated:  {'coef1': 105}    score    :   49.995 +-    0.632    Elo      :   -0.034 +-    4.390 
   evaluated:  {'coef1': 107}    score    :   50.039 +-    0.624    Elo      :    0.270 +-    4.334 
   evaluated:  {'coef1': 107}    score    :   50.228 +-    0.621    Elo      :    1.585 +-    4.317 
   evaluated:  {'coef1': 109}    score    :   50.257 +-    0.628    Elo      :    1.788 +-    4.361 
   evaluated:  {'coef1': 115}    score    :   49.558 +-    0.610    Elo      :   -3.070 +-    4.242 
   evaluated:  {'coef1': 115}    score    :   49.709 +-    0.621    Elo      :   -2.024 +-    4.314 
   evaluated:  {'coef1': 115}    score    :   50.194 +-    0.618    Elo      :    1.349 +-    4.292 
   evaluated:  {'coef1': 116}    score    :   50.485 +-    0.625    Elo      :    3.373 +-    4.343 
   evaluated:  {'coef1': 117}    score    :   50.087 +-    0.619    Elo      :    0.607 +-    4.301 
   evaluated:  {'coef1': 121}    score    :   49.816 +-    0.614    Elo      :   -1.282 +-    4.266 
   evaluated:  {'coef1': 122}    score    :   50.413 +-    0.622    Elo      :    2.867 +-    4.320 
   evaluated:  {'coef1': 124}    score    :   50.170 +-    0.620    Elo      :    1.181 +-    4.306 
   evaluated:  {'coef1': 125}    score    :   50.039 +-    0.618    Elo      :    0.270 +-    4.293 
   evaluated:  {'coef1': 126}    score    :   50.282 +-    0.616    Elo      :    1.956 +-    4.281 
   evaluated:  {'coef1': 130}    score    :   50.252 +-    0.610    Elo      :    1.754 +-    4.238 
   evaluated:  {'coef1': 131}    score    :   50.272 +-    0.614    Elo      :    1.889 +-    4.265 
   evaluated:  {'coef1': 135}    score    :   49.558 +-    0.623    Elo      :   -3.070 +-    4.332 
   evaluated:  {'coef1': 137}    score    :   50.126 +-    0.618    Elo      :    0.877 +-    4.298 
   evaluated:  {'coef1': 138}    score    :   50.000 +-    0.618    Elo      :   -0.000 +-    4.297 
   evaluated:  {'coef1': 139}    score    :   50.597 +-    0.620    Elo      :    4.149 +-    4.312 
   evaluated:  {'coef1': 139}    score    :   50.660 +-    0.614    Elo      :    4.588 +-    4.265 
   evaluated:  {'coef1': 141}    score    :   50.073 +-    0.617    Elo      :    0.506 +-    4.284 
   evaluated:  {'coef1': 143}    score    :   49.835 +-    0.627    Elo      :   -1.147 +-    4.357 
   evaluated:  {'coef1': 144}    score    :   49.874 +-    0.621    Elo      :   -0.877 +-    4.313 
   evaluated:  {'coef1': 145}    score    :   50.587 +-    0.617    Elo      :    4.082 +-    4.289 
   evaluated:  {'coef1': 147}    score    :   50.131 +-    0.608    Elo      :    0.911 +-    4.222 
   evaluated:  {'coef1': 150}    score    :   49.830 +-    0.616    Elo      :   -1.181 +-    4.279 
   evaluated:  {'coef1': 150}    score    :   50.112 +-    0.619    Elo      :    0.776 +-    4.298 
   evaluated:  {'coef1': 156}    score    :   49.539 +-    0.621    Elo      :   -3.205 +-    4.319 
   evaluated:  {'coef1': 158}    score    :   50.607 +-    0.616    Elo      :    4.217 +-    4.280 
   evaluated:  {'coef1': 159}    score    :   49.587 +-    0.620    Elo      :   -2.867 +-    4.307 
   evaluated:  {'coef1': 160}    score    :   49.709 +-    0.612    Elo      :   -2.024 +-    4.253 
   evaluated:  {'coef1': 160}    score    :   50.539 +-    0.622    Elo      :    3.744 +-    4.326 
   evaluated:  {'coef1': 162}    score    :   49.893 +-    0.615    Elo      :   -0.742 +-    4.275 
   evaluated:  {'coef1': 165}    score    :   50.083 +-    0.621    Elo      :    0.573 +-    4.313 
   evaluated:  {'coef1': 168}    score    :   50.374 +-    0.616    Elo      :    2.597 +-    4.282 
   evaluated:  {'coef1': 168}    score    :   50.607 +-    0.621    Elo      :    4.217 +-    4.317 
   evaluated:  {'coef1': 168}    score    :   50.772 +-    0.622    Elo      :    5.364 +-    4.324 
   evaluated:  {'coef1': 169}    score    :   50.485 +-    0.623    Elo      :    3.373 +-    4.327 
   evaluated:  {'coef1': 170}    score    :   49.583 +-    0.616    Elo      :   -2.901 +-    4.280 
   evaluated:  {'coef1': 171}    score    :   50.583 +-    0.616    Elo      :    4.048 +-    4.280 
   evaluated:  {'coef1': 172}    score    :   50.233 +-    0.619    Elo      :    1.619 +-    4.303 
   evaluated:  {'coef1': 172}    score    :   50.286 +-    0.618    Elo      :    1.990 +-    4.292 
   evaluated:  {'coef1': 173}    score    :   49.680 +-    0.617    Elo      :   -2.226 +-    4.286 
   evaluated:  {'coef1': 173}    score    :   50.189 +-    0.616    Elo      :    1.316 +-    4.282 
   evaluated:  {'coef1': 173}    score    :   50.320 +-    0.617    Elo      :    2.226 +-    4.291 
   evaluated:  {'coef1': 174}    score    :   49.743 +-    0.616    Elo      :   -1.788 +-    4.280 
   evaluated:  {'coef1': 177}    score    :   50.058 +-    0.621    Elo      :    0.405 +-    4.318 
   evaluated:  {'coef1': 178}    score    :   49.767 +-    0.618    Elo      :   -1.619 +-    4.297 
   evaluated:  {'coef1': 178}    score    :   50.471 +-    0.622    Elo      :    3.272 +-    4.326 
   evaluated:  {'coef1': 178}    score    :   50.602 +-    0.619    Elo      :    4.183 +-    4.304 
   evaluated:  {'coef1': 179}    score    :   50.495 +-    0.624    Elo      :    3.441 +-    4.334 
   evaluated:  {'coef1': 181}    score    :   50.024 +-    0.616    Elo      :    0.169 +-    4.279 
   evaluated:  {'coef1': 183}    score    :   49.879 +-    0.610    Elo      :   -0.843 +-    4.241 
   evaluated:  {'coef1': 183}    score    :   50.476 +-    0.623    Elo      :    3.306 +-    4.327 
   evaluated:  {'coef1': 186}    score    :   50.262 +-    0.620    Elo      :    1.822 +-    4.311 
   evaluated:  {'coef1': 186}    score    :   50.714 +-    0.620    Elo      :    4.959 +-    4.311 
   evaluated:  {'coef1': 191}    score    :   50.175 +-    0.621    Elo      :    1.214 +-    4.315 
   evaluated:  {'coef1': 192}    score    :   50.267 +-    0.616    Elo      :    1.855 +-    4.280 
   evaluated:  {'coef1': 199}    score    :   49.786 +-    0.613    Elo      :   -1.484 +-    4.260 
   evaluated:  {'coef1': 203}    score    :   49.927 +-    0.620    Elo      :   -0.506 +-    4.312 
   evaluated:  {'coef1': 204}    score    :   49.694 +-    0.624    Elo      :   -2.125 +-    4.339 
   evaluated:  {'coef1': 205}    score    :   49.617 +-    0.613    Elo      :   -2.665 +-    4.263 
   evaluated:  {'coef1': 208}    score    :   49.767 +-    0.622    Elo      :   -1.619 +-    4.322 
   evaluated:  {'coef1': 212}    score    :   49.709 +-    0.621    Elo      :   -2.024 +-    4.315 
   evaluated:  {'coef1': 228}    score    :   49.529 +-    0.622    Elo      :   -3.272 +-    4.324 
   evaluated:  {'coef1': 230}    score    :   49.262 +-    0.621    Elo      :   -5.128 +-    4.316 
   evaluated:  {'coef1': 232}    score    :   49.228 +-    0.623    Elo      :   -5.364 +-    4.333 
   evaluated:  {'coef1': 242}    score    :   48.903 +-    0.621    Elo      :   -7.625 +-    4.316 
   evaluated:  {'coef1': 252}    score    :   48.922 +-    0.626    Elo      :   -7.490 +-    4.354 
   evaluated:  {'coef1': 275}    score    :   48.248 +-    0.631    Elo      :  -12.182 +-    4.388 
   evaluated:  {'coef1': 391}    score    :   42.015 +-    0.652    Elo      :  -55.968 +-    4.647 

For the nevergrad4sf optimizer this yields the following convergence:

optimal at iter 1 after 1 evaluation and 10300 games : {'coef1': 20}
optimal at iter 2 after 5 evaluations and 51500 games : {'coef1': 56}
optimal at iter 3 after 10 evaluations and 103000 games : {'coef1': 103}
optimal at iter 4 after 15 evaluations and 154500 games : {'coef1': 115}
optimal at iter 5 after 20 evaluations and 206000 games : {'coef1': 92}
optimal at iter 6 after 25 evaluations and 257500 games : {'coef1': 168}
optimal at iter 7 after 30 evaluations and 309000 games : {'coef1': 107}
optimal at iter 8 after 35 evaluations and 360500 games : {'coef1': 115}
optimal at iter 9 after 40 evaluations and 412000 games : {'coef1': 173}
optimal at iter 10 after 45 evaluations and 463500 games : {'coef1': 169}
optimal at iter 11 after 53 evaluations and 545900 games : {'coef1': 176}
optimal at iter 12 after 61 evaluations and 628300 games : {'coef1': 143}
optimal at iter 13 after 69 evaluations and 710700 games : {'coef1': 169}
optimal at iter 14 after 77 evaluations and 793100 games : {'coef1': 184}
optimal at iter 15 after 85 evaluations and 875500 games : {'coef1': 153}
optimal at iter 16 after 101 evaluations and 1040300 games : {'coef1': 155}

I'll try to update as I get more data (Edit: update 2).

To verify if 115 could be a good parameter, I've launched a test here: https://tests.stockfishchess.org/tests/view/5eac0b636ffeed51f6e321f4 idem for 153: https://tests.stockfishchess.org/tests/view/5eac1d2c6ffeed51f6e321fe

vdbergh commented 4 years ago

BTW, I agree with doing simulation to study spsa behavior. I'll report later today some up to date results for the coef1 variable by @MJZ1977 what Elo estimates are for various values, so that we can in principle make an accurate loss function. I guess that actually needs @vdbergh input, to do that correctly (i.e. so that the noise is realistic as well). I'll also use it as test for the nevergrad4sf so we can compare.

@vondele Probably you have already moved on.

In any case for simulation the only thing we have to do is to supply a realistic function

(params)-->Elo

With such a function one can simulate the outcome of the games that are used as input to spsa.

To do this one needs an Elo model to translate Elo differences into w,d,l. In first approximation a fixed draw ratio and no opening book bias would do I think.

More advanced would be to use the BayesElo model. But then one has to the translation (Elo,draw_ratio,bias)-->(BayesElo,draw_elo,advantage). The SPRT simulator https://github.com/vdbergh/simul does this but the code is in C. I can extract it but not immediately.

vondele commented 4 years ago

I haven't moved on.... you're the expert for Elo -> game result simulation :-) Also, I don't know SPSA at all, so I'd be more than happy for you, @ppigazzini or @MJZ1977 to look into this..

vondele commented 4 years ago

kdtweak

So, that's the model from the latest data points. It pretty accurately is quadratic (i.e. cubic, quartic fits were equivalent). Elo(x) = 1.49643 - 1./2 * ((x - 151.148) / 23.6133) ** 2

From the two tests mentioned above: DrawElo (BayesElo) | 250.93 DrawElo (BayesElo) | 248.71 RMS bias (Elo) | 31.670 RMS bias (Elo) | -0.000 (?!)

ppigazzini commented 4 years ago

I have looked a bit at the spsa implementation in fishtest and I see no obvious problems with it. Of course the devil is sometimes in the details.

IMO you described well the devil in your paper, p(k+1)=p(k)+a/c:

IMO is more correct to use p(k+1)=p(k)+a

vdbergh commented 4 years ago

@vondele The statistical measurement of RMS bias needs a lot of games (and even then there are outliers). But for noob_3moves it is safe to take 30 (*) (I have been observing it for a long time).

Running simul ./simul --elo 0 --bias 30 --draw_ratio 0.61 gives

draw_elo   = 250.3990
advantage  =  48.3115

When using paired games it should be safe to consider advantage as the advantage for white in the BayesElo model (although in reality it will not be).

For converting Elo to BayesElo I would multiply with the scale factor (de=draw_elo).

def scale(de):
    return (4*10**(-de/400))/(1+10**(-de/400))**2

(*) Actually for a placebo parameter it will be much higher. I do not know if this is relevant or not.

vdbergh commented 4 years ago

@vondele Actually I now realize the bias/advantage is irrelevant. spsa only uses scores. So one can just set the bias to zero.

vondele commented 4 years ago

One more data drop. I wanted to see what the effect of using VSTC (2+0.02) in this case was, so very similar data:

   evaluated:  {'coef1': 1}    score    :   48.694 +-    0.761    Elo      :   -9.076 +-    5.295 
   evaluated:  {'coef1': 22}    score    :   49.107 +-    0.766    Elo      :   -6.207 +-    5.322 
   evaluated:  {'coef1': 24}    score    :   49.544 +-    0.761    Elo      :   -3.171 +-    5.287 
   evaluated:  {'coef1': 27}    score    :   48.752 +-    0.765    Elo      :   -8.671 +-    5.322 
   evaluated:  {'coef1': 31}    score    :   49.481 +-    0.758    Elo      :   -3.609 +-    5.268 
   evaluated:  {'coef1': 31}    score    :   49.961 +-    0.765    Elo      :   -0.270 +-    5.315 
   evaluated:  {'coef1': 40}    score    :   49.893 +-    0.764    Elo      :   -0.742 +-    5.312 
   evaluated:  {'coef1': 45}    score    :   49.233 +-    0.761    Elo      :   -5.330 +-    5.290 
   evaluated:  {'coef1': 49}    score    :   49.451 +-    0.763    Elo      :   -3.812 +-    5.302 
   evaluated:  {'coef1': 53}    score    :   49.252 +-    0.756    Elo      :   -5.195 +-    5.253 
   evaluated:  {'coef1': 53}    score    :   49.451 +-    0.764    Elo      :   -3.812 +-    5.308 
   evaluated:  {'coef1': 61}    score    :   49.806 +-    0.761    Elo      :   -1.349 +-    5.292 
   evaluated:  {'coef1': 63}    score    :   49.956 +-    0.762    Elo      :   -0.304 +-    5.295 
   evaluated:  {'coef1': 68}    score    :   50.403 +-    0.765    Elo      :    2.800 +-    5.316 
   evaluated:  {'coef1': 69}    score    :   50.587 +-    0.756    Elo      :    4.082 +-    5.255 
   evaluated:  {'coef1': 71}    score    :   49.723 +-    0.762    Elo      :   -1.923 +-    5.299 
   evaluated:  {'coef1': 82}    score    :   50.417 +-    0.752    Elo      :    2.901 +-    5.227 
   evaluated:  {'coef1': 108}    score    :   50.311 +-    0.757    Elo      :    2.159 +-    5.260 
   evaluated:  {'coef1': 110}    score    :   50.238 +-    0.756    Elo      :    1.653 +-    5.251 
   evaluated:  {'coef1': 114}    score    :   50.544 +-    0.754    Elo      :    3.778 +-    5.243 
   evaluated:  {'coef1': 114}    score    :   50.587 +-    0.753    Elo      :    4.082 +-    5.237 
   evaluated:  {'coef1': 118}    score    :   50.058 +-    0.753    Elo      :    0.405 +-    5.234 
   evaluated:  {'coef1': 121}    score    :   51.053 +-    0.751    Elo      :    7.321 +-    5.219 
   evaluated:  {'coef1': 123}    score    :   50.442 +-    0.755    Elo      :    3.070 +-    5.248 
   evaluated:  {'coef1': 124}    score    :   50.000 +-    0.754    Elo      :   -0.000 +-    5.240 
   evaluated:  {'coef1': 125}    score    :   50.524 +-    0.750    Elo      :    3.643 +-    5.210 
   evaluated:  {'coef1': 125}    score    :   50.854 +-    0.757    Elo      :    5.937 +-    5.265 
   evaluated:  {'coef1': 126}    score    :   50.505 +-    0.757    Elo      :    3.508 +-    5.261 
   evaluated:  {'coef1': 126}    score    :   51.024 +-    0.753    Elo      :    7.118 +-    5.233 
   evaluated:  {'coef1': 127}    score    :   49.354 +-    0.754    Elo      :   -4.487 +-    5.242 
   evaluated:  {'coef1': 128}    score    :   50.087 +-    0.756    Elo      :    0.607 +-    5.254 
   evaluated:  {'coef1': 128}    score    :   50.714 +-    0.752    Elo      :    4.959 +-    5.224 
   evaluated:  {'coef1': 128}    score    :   51.184 +-    0.752    Elo      :    8.232 +-    5.232 
   evaluated:  {'coef1': 129}    score    :   49.461 +-    0.758    Elo      :   -3.744 +-    5.268 
   evaluated:  {'coef1': 129}    score    :   50.083 +-    0.751    Elo      :    0.573 +-    5.219 
   evaluated:  {'coef1': 129}    score    :   50.772 +-    0.755    Elo      :    5.364 +-    5.249 
   evaluated:  {'coef1': 129}    score    :   51.350 +-    0.750    Elo      :    9.380 +-    5.219 
   evaluated:  {'coef1': 130}    score    :   50.262 +-    0.752    Elo      :    1.822 +-    5.226 
   evaluated:  {'coef1': 130}    score    :   50.296 +-    0.759    Elo      :    2.058 +-    5.272 
   evaluated:  {'coef1': 130}    score    :   50.563 +-    0.754    Elo      :    3.913 +-    5.242 
   evaluated:  {'coef1': 130}    score    :   50.718 +-    0.754    Elo      :    4.993 +-    5.239 
   evaluated:  {'coef1': 130}    score    :   51.083 +-    0.754    Elo      :    7.523 +-    5.242 
   evaluated:  {'coef1': 131}    score    :   50.282 +-    0.752    Elo      :    1.956 +-    5.225 
   evaluated:  {'coef1': 131}    score    :   50.364 +-    0.754    Elo      :    2.530 +-    5.239 
   evaluated:  {'coef1': 131}    score    :   50.874 +-    0.755    Elo      :    6.072 +-    5.252 
   evaluated:  {'coef1': 131}    score    :   50.995 +-    0.760    Elo      :    6.916 +-    5.281 
   evaluated:  {'coef1': 131}    score    :   51.403 +-    0.753    Elo      :    9.751 +-    5.237 
   evaluated:  {'coef1': 132}    score    :   50.330 +-    0.759    Elo      :    2.294 +-    5.274 
   evaluated:  {'coef1': 132}    score    :   50.515 +-    0.751    Elo      :    3.576 +-    5.217 
   evaluated:  {'coef1': 132}    score    :   50.621 +-    0.755    Elo      :    4.318 +-    5.250 
   evaluated:  {'coef1': 132}    score    :   50.801 +-    0.756    Elo      :    5.566 +-    5.256 
   evaluated:  {'coef1': 132}    score    :   51.233 +-    0.757    Elo      :    8.570 +-    5.261 
   evaluated:  {'coef1': 133}    score    :   50.005 +-    0.756    Elo      :    0.034 +-    5.256 
   evaluated:  {'coef1': 133}    score    :   50.024 +-    0.751    Elo      :    0.169 +-    5.222 
   evaluated:  {'coef1': 133}    score    :   50.092 +-    0.749    Elo      :    0.641 +-    5.207 
   evaluated:  {'coef1': 133}    score    :   50.403 +-    0.757    Elo      :    2.800 +-    5.259 
   evaluated:  {'coef1': 133}    score    :   50.485 +-    0.752    Elo      :    3.373 +-    5.224 
   evaluated:  {'coef1': 133}    score    :   50.893 +-    0.751    Elo      :    6.207 +-    5.223 
   evaluated:  {'coef1': 134}    score    :   49.869 +-    0.751    Elo      :   -0.911 +-    5.222 
   evaluated:  {'coef1': 134}    score    :   50.160 +-    0.750    Elo      :    1.113 +-    5.213 
   evaluated:  {'coef1': 134}    score    :   50.272 +-    0.755    Elo      :    1.889 +-    5.244 
   evaluated:  {'coef1': 134}    score    :   50.277 +-    0.757    Elo      :    1.923 +-    5.264 
   evaluated:  {'coef1': 134}    score    :   50.282 +-    0.754    Elo      :    1.956 +-    5.243 
   evaluated:  {'coef1': 134}    score    :   51.092 +-    0.754    Elo      :    7.591 +-    5.244 
   evaluated:  {'coef1': 135}    score    :   50.141 +-    0.755    Elo      :    0.978 +-    5.245 
   evaluated:  {'coef1': 135}    score    :   50.296 +-    0.754    Elo      :    2.058 +-    5.238 
   evaluated:  {'coef1': 135}    score    :   50.340 +-    0.755    Elo      :    2.361 +-    5.247 
   evaluated:  {'coef1': 135}    score    :   50.519 +-    0.753    Elo      :    3.609 +-    5.232 
   evaluated:  {'coef1': 135}    score    :   51.447 +-    0.746    Elo      :   10.055 +-    5.186 
   evaluated:  {'coef1': 136}    score    :   48.971 +-    0.749    Elo      :   -7.152 +-    5.209 
   evaluated:  {'coef1': 136}    score    :   50.267 +-    0.754    Elo      :    1.855 +-    5.241 
   evaluated:  {'coef1': 136}    score    :   50.316 +-    0.751    Elo      :    2.193 +-    5.218 
   evaluated:  {'coef1': 136}    score    :   50.466 +-    0.752    Elo      :    3.238 +-    5.228 
   evaluated:  {'coef1': 136}    score    :   50.665 +-    0.749    Elo      :    4.622 +-    5.203 
   evaluated:  {'coef1': 136}    score    :   50.835 +-    0.750    Elo      :    5.802 +-    5.212 
   evaluated:  {'coef1': 136}    score    :   51.097 +-    0.749    Elo      :    7.625 +-    5.209 
   evaluated:  {'coef1': 137}    score    :   50.083 +-    0.753    Elo      :    0.573 +-    5.236 
   evaluated:  {'coef1': 137}    score    :   50.505 +-    0.756    Elo      :    3.508 +-    5.253 
   evaluated:  {'coef1': 138}    score    :   50.286 +-    0.757    Elo      :    1.990 +-    5.260 
   evaluated:  {'coef1': 138}    score    :   50.291 +-    0.752    Elo      :    2.024 +-    5.225 
   evaluated:  {'coef1': 138}    score    :   50.369 +-    0.752    Elo      :    2.564 +-    5.227 
   evaluated:  {'coef1': 138}    score    :   50.607 +-    0.752    Elo      :    4.217 +-    5.225 
   evaluated:  {'coef1': 138}    score    :   50.961 +-    0.757    Elo      :    6.680 +-    5.265 
   evaluated:  {'coef1': 139}    score    :   50.612 +-    0.755    Elo      :    4.250 +-    5.249 
   evaluated:  {'coef1': 139}    score    :   50.650 +-    0.750    Elo      :    4.520 +-    5.210 
   evaluated:  {'coef1': 139}    score    :   50.699 +-    0.752    Elo      :    4.858 +-    5.228 
   evaluated:  {'coef1': 140}    score    :   50.000 +-    0.754    Elo      :   -0.000 +-    5.243 
   evaluated:  {'coef1': 141}    score    :   50.578 +-    0.753    Elo      :    4.014 +-    5.230 
   evaluated:  {'coef1': 142}    score    :   50.447 +-    0.752    Elo      :    3.103 +-    5.230 
   evaluated:  {'coef1': 143}    score    :   49.840 +-    0.755    Elo      :   -1.113 +-    5.247 
   evaluated:  {'coef1': 143}    score    :   50.083 +-    0.757    Elo      :    0.573 +-    5.259 
   evaluated:  {'coef1': 143}    score    :   50.330 +-    0.751    Elo      :    2.294 +-    5.216 
   evaluated:  {'coef1': 144}    score    :   50.461 +-    0.750    Elo      :    3.205 +-    5.211 
   evaluated:  {'coef1': 144}    score    :   50.820 +-    0.759    Elo      :    5.701 +-    5.275 
   evaluated:  {'coef1': 145}    score    :   50.388 +-    0.753    Elo      :    2.699 +-    5.233 
   evaluated:  {'coef1': 145}    score    :   50.748 +-    0.760    Elo      :    5.195 +-    5.282 
   evaluated:  {'coef1': 146}    score    :   50.034 +-    0.755    Elo      :    0.236 +-    5.247 
   evaluated:  {'coef1': 146}    score    :   50.937 +-    0.752    Elo      :    6.511 +-    5.227 
   evaluated:  {'coef1': 148}    score    :   50.718 +-    0.752    Elo      :    4.993 +-    5.227 
   evaluated:  {'coef1': 149}    score    :   49.699 +-    0.750    Elo      :   -2.091 +-    5.214 
   evaluated:  {'coef1': 149}    score    :   50.000 +-    0.751    Elo      :   -0.000 +-    5.219 
   evaluated:  {'coef1': 149}    score    :   50.587 +-    0.751    Elo      :    4.082 +-    5.217 
   evaluated:  {'coef1': 162}    score    :   50.631 +-    0.753    Elo      :    4.385 +-    5.236 
   evaluated:  {'coef1': 174}    score    :   50.233 +-    0.752    Elo      :    1.619 +-    5.225 
   evaluated:  {'coef1': 178}    score    :   50.340 +-    0.755    Elo      :    2.361 +-    5.248 
   evaluated:  {'coef1': 196}    score    :   50.029 +-    0.752    Elo      :    0.202 +-    5.225 
   evaluated:  {'coef1': 240}    score    :   49.476 +-    0.754    Elo      :   -3.643 +-    5.240 
   evaluated:  {'coef1': 332}    score    :   45.500 +-    0.761    Elo      :  -31.354 +-    5.329 

convergence of nevergrad4sf

optimal at iter 1 after 1 evaluation and 10300 games : {'coef1': 20}
optimal at iter 2 after 5 evaluations and 51500 games : {'coef1': 69}
optimal at iter 3 after 10 evaluations and 103000 games : {'coef1': 82}
optimal at iter 4 after 15 evaluations and 154500 games : {'coef1': 114}
optimal at iter 5 after 20 evaluations and 206000 games : {'coef1': 130}
optimal at iter 6 after 28 evaluations and 288400 games : {'coef1': 140}
optimal at iter 7 after 36 evaluations and 370800 games : {'coef1': 129}
optimal at iter 8 after 44 evaluations and 453200 games : {'coef1': 140}
optimal at iter 9 after 52 evaluations and 535600 games : {'coef1': 128}
optimal at iter 10 after 60 evaluations and 618000 games : {'coef1': 133}
optimal at iter 11 after 76 evaluations and 782800 games : {'coef1': 134}
optimal at iter 12 after 92 evaluations and 947600 games : {'coef1': 133}
optimal at iter 13 after 108 evaluations and 1112400 games : {'coef1': 135}

and the updated graph showing both data sets: kdtweak

Edit: a VSTC SPRT test run on fishtest shows that 135 is indeed a better value at that TC: https://tests.stockfishchess.org/tests/view/5eac30ca6ffeed51f6e32208

vdbergh commented 4 years ago

@ppigazzini Well I find it hard to read what I wrote myself (it was just a quick draft).

However the second display on page 2 seems to suggest that the Fishtest implementation is correct. Compare with (6) in https://www.jhuapl.edu/SPSA/PDF-SPSA/Spall_Stochastic_Optimization.PDF

We want a to be the learning rate. Of course there is a rather obscure extra factor u_1 which can however be explicitly computed (see Example 1.1).

EDIT: this is assuming that the batching doesn't do any harm.

ppigazzini commented 4 years ago

@vdbergh SPSA is a simple gradient descent. The problem is the function we want to optimize and how we compute the derivative.

Fishtest now uses this function and derivative:

  1. f(p+c)-f(p-c)=wins-losses
  2. f'=(wins-losses)/c

The problem is that 1. has values (for a couple of games) bounded in [+2; -2]: a derivative should not depend on the delta used to compute it. A minor problem is that 2. lacks a division by 2.

I propose to use this function and derivative:

  1. f(p+c)-f(p-c)=2c(wins-losses)
  2. f'=(wins-losses)
vdbergh commented 4 years ago

@ppigazzini We want to do gradient ascent for the function f:params->Elo. On average wins-losses will be proportional to cf’ . So we are really computing the derivative of f. Stochastically.

MJZ1977 commented 4 years ago

Some time ago I started thinking about SPSA and wrote this document

support_ornstein_uhlenbeck.pdf

My initial hope was to get information on good choices of hyper parameters but nothing obvious came out. So the above document is incomplete and in addition it says nothing about batching.

@vdbergh : I tried to understand your paper, and I hope I catched the ideas. At the end, the perturbed system is converging to something equal to "unperturbed" + "integral of pertubations". the integral of perturbation is not surely equal to zero because a(t) and c(t) are varying (that what I have called numerical dispersion). But is is clear that if you decrease the pertubation, this integral is decreasing also. That is what can happens if we increase the batch.

@ppigazzini : I understand what you wrote, but I think that our SPSA is just saying "should I increase or decrease the parameter ?" and after this move by a/c. At the end, the most important is the direction and the parameter a/c.

MJZ1977 commented 4 years ago

@vondele : impressive curves ! I don't know how you have done all this in some hours.

vdbergh commented 4 years ago

Batching is ok it seems. But I wonder if it would be worthwhile to consider the batch as a single iteration and to normalise the result (dividing by games/2). Basically we are doing a more accurate measurement of the gradient of the Elo(params) function. I think batching is discussed in one of the spsa papers but I can’t find it now.

vdbergh commented 4 years ago

What I am proposing is called gradient averaging and it is discussed a bit in this paper.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.4562&rep=rep1&type=pdf

See Table 1. For very noisy observations, as in our case, gradient averaging is apparently advantageous.