kiudee / chess-tuning-tools

A collection of scripts aimed at efficiently tuning chess engine parameters.
https://chess-tuning-tools.readthedocs.io/en/latest/
Other
52 stars 13 forks source link

Divide by zero error when using input warping #93

Closed AlexisOlson closed 3 years ago

AlexisOlson commented 4 years ago

Description

I'm trying a tune of three parameters but turning on warp_input throws a divide by zero error every iteration and a weird warping for one of the parameters along with weird confidence bound for all parameters. (Everything looks normal without warping.)

Error:

/root/miniconda/envs/tuning/lib/python3.8/site-packages/sklearn/gaussian_process/kernels.py:255:
RuntimeWarning: divide by zero encountered in log
  return np.log(np.hstack(theta))

Without warping:

20200909-182503-280

With warping:

20200909-182930-280

Warp functions:

Input warping was applied using the following parameters for the beta distributions:
{'ScaleQ': (0.191, 0.849), 'Cpuct': (0.471, 1.083), 'FpuValue': (0.635, 1.421)}

image

This warping for ScaleQ doesn't make sense but I suspect that it doesn't know what it's doing because the effect of this parameter is still not yet determined very well. The effect isn't nearly as strong as the other parameters and there is an expected interaction among all of them. I'll keep running non-warped ts and vr sampling for a while and try again once it starts to understand better.

Files to reproduce: DataAndConfig.zip

What I Did

Changing "warp_inputs": false to "warp_inputs": true in the config.json results in the runtime warning:

2020-09-09 18:34:09,833 INFO     Got score: 5.6995265465242655 +- 1.9955240492535908
2020-09-09 18:34:09,834 INFO     Updating model
/root/miniconda/envs/tuning/lib/python3.8/site-packages/sklearn/gaussian_process/kernels.py:255: RuntimeWarning: divide by zero encountered in log
  return np.log(np.hstack(theta))
2020-09-09 18:34:28,397 INFO     GP sampling finished (18.563089s)
2020-09-09 18:34:28,398 DEBUG    GP kernel: 1.54**2 * Matern(length_scale=[0.522, 0.362, 0.276], nu=2.5) + WhiteKernel(noise_level=3.88e-06)
2020-09-09 18:34:28,399 DEBUG    Input warping was applied using the following parameters for the beta distributions:
{'ScaleQ': (0.191, 0.849), 'Cpuct': (0.471, 1.083), 'FpuValue': (0.635, 1.421)}
2020-09-09 18:34:28,408 INFO     Starting iteration 281
2020-09-09 18:34:41,482 INFO     Current optimum:
{'ScaleQ': 0.3991959134189197, 'Cpuct': 4.700855456794591, 'FpuValue': 1.4722686789734403}
2020-09-09 18:34:41,482 INFO     Estimated value: -0.5733 +- 0.3404
2020-09-09 18:34:41,482 INFO     90.0% confidence interval of the value: (-1.1332, -0.0133)
2020-09-09 18:34:41,648 INFO     90.0% confidence intervals of the parameters:
Parameter          Lower bound          Upper bound
---------------------------------------------------
ScaleQ      11.758489497804575   11.758489497804575
Cpuct       0.9963961100717935   0.9963961100717935
FpuValue   -0.9812151210716575  -0.9812151210716575

Related to #92.

kiudee commented 4 years ago

Very interesting, thank you for the detailed report! We may have to make the prior over the warping parameters even stricter. In the paper they used a very flexible prior (logNormal(loc=0, scale=0.89)) distribution which turned out to be too unstable (the likelihood quickly favored extreme warping functions to overfit the data). I think it makes sense to really encourage the likelihood to stay close to the identity function and only with significant data it should be able to deviate.

edit: Currently, I apply a logNormal(0, 0.5) distribution as prior.

kiudee commented 4 years ago

With the new version of bask==0.9.3 does this problem still occur?

AlexisOlson commented 4 years ago

With the new version of bask==0.9.3 does this problem still occur?

It's still a problem with bask 0.10.0.

kiudee commented 4 years ago

Ok, I will investigate further.

kiudee commented 4 years ago

The error message:

lib\site-packages\sklearn\gaussian_process\kernels.py:255:
RuntimeWarning: divide by zero encountered in log
  return np.log(np.hstack(theta))

happens during the initial maximum (marginal) likelihood step where the Gaussian process is fit with identity warping function. During the optimization of the likelihood it can happen that it sets the noise to 0 (the final estimate was 1e-5). That is possible because we give it a noise estimate per datapoint (the standard error of the Elo estimate). Thus by 0 noise we mean 0 residual variation. That causes a 0 to be passed into the log function. Even though I think this is not problematic for us, I will add additional checks and filter the warning.

Regarding the strong warping, I will debug some more.

kiudee commented 4 years ago

Fixed this particular warning in bask 0.10.2.