optimagic-dev / optimagic

optimagic is a Python package for numerical optimization. It is a unified interface to optimizers from SciPy, NlOpt and other packages. optimagic's minimize function works just like SciPy's, so you don't have to adjust your code. You simply get more optimizers for free. On top you get diagnostic tools, parallel numerical derivatives and more.
https://optimagic.readthedocs.io/
MIT License
270 stars 30 forks source link

Noise preparation and tuning #434

Closed janosg closed 1 year ago

janosg commented 1 year ago

This PR is a preparation for noise handling and drastically improves the performance of tranquilo in the non-noisy case. Here is a summary of the changes:

Factor out acceptance decision

Acceptance decisions were hardcoded and are now an interchangeable component with a standardized interface. Acceptance decisions are done at the end of each iteration, after having the solution of the trustregion subproblem. They can do one or multiple function evaluations and decide which point is accepted, i.e. becomes the center of the trustregion for the next iteration. Usually, this point is the argmin of the subproblem, but acceptance deciders could be more creative. In the noisy case our default acceptance decider will determine sample sizes based on a power analysis.

Add functions for variance estimation

We now have functions to estimate the variance of the noise terms. We can separately estimate noise variances for least squares residuals and scalar function values. In the least squares case we allow for correlation of noise terms.

Switch from sampling more points to evaluating more often at a smaller number of points

In the underdetermined case (which we rarely leave for scalar criterion functions) the model interpolates all criterion evaluations, unless we evaluate multiple times at the same point. Thus no averaging takes place. Forcing a model to not do so, would required penalized fitting with a correct penalty level. Our experiments show that it is hard to set the correct penalty terms in a ridge regression. We thus switch to evaluating multiple times at one point (as dfols and pybobyqa).

This required a complete rewrite of the history class but makes variance estimation much easier.

Improve tools for working with models

Our model classes (VectorModel and ScalarModel) now have predict methods and we have new functions to work with models:

Refactor fitting

The fitting methods now expect uncentered data and center the data automatically to the trustregion. The region becomes part of the model information.

We add the option to residualize the data, i.e. to regress out the prediction of the model from the previous iteration. This allows us to not penalize parameter values but changes in parameter values between too iterations.

We implement a wrapper around np.linalg.lstsq that rescales the data such that the penalty term that is minimized in the underdetermined case puts different weights on intercepts, first order parameters and second order parameters.

Residualizing and the new fitting method together give us a large boost in performance.

New initial trustregion radius

The inital trustregion radius is now determined based on the size of the parameter values. This is exactly as in pybobyqa.

This gives another huge boost in performance.

Simulate rho noise

We add a function to simulate which rho (i.e. actual over expected improvement) would obtain if we only had noise and no approximation error.

Change how we determine which points are used for the model

Our old approach was to keep as many points as possible as long as they are closer than a multiple (e.g. 5x) of the trustregion radius. Alternatives we tried but that did not work were filtering those points based on geometric conditions or clustering).

The problem with this approach is that the distant points can be very influential and prevent progress. I.e. the new model's solution is always very close to the old model's solution.

We now tackle this problem directly. I.e. if the new argmin is close to the old argmin, we drop points until we get down to the target sample size. We first drop points that are far away and then points that are close to other points. If the argmin is still close to the previous argmin, we start replacing points.

This gives a huge boost in performance.

codecov[bot] commented 1 year ago

Codecov Report

Merging #434 (1c4163a) into main (01a4f9c) will decrease coverage by 0.01%. The diff coverage is 92.52%.

@@            Coverage Diff             @@
##             main     #434      +/-   ##
==========================================
- Coverage   92.87%   92.86%   -0.01%     
==========================================
  Files         234      242       +8     
  Lines       17596    18472     +876     
==========================================
+ Hits        16342    17154     +812     
- Misses       1254     1318      +64     
Impacted Files Coverage Δ
...timagic/optimization/tranquilo/aggregate_models.py 87.71% <ø> (ø)
...magic/optimization/tranquilo/wrapped_subsolvers.py 19.51% <19.51%> (ø)
src/estimagic/optimization/nag_optimizers.py 82.65% <33.33%> (-3.22%) :arrow_down:
...rc/estimagic/optimization/tranquilo/new_history.py 83.21% <83.21%> (ø)
.../estimagic/optimization/tranquilo/filter_points.py 78.86% <85.29%> (-4.29%) :arrow_down:
...imagic/optimization/tranquilo/tranquilo_history.py 76.53% <87.50%> (-2.26%) :arrow_down:
src/estimagic/optimization/tranquilo/fit_models.py 87.05% <91.89%> (+2.12%) :arrow_up:
...estimagic/optimization/tranquilo/wrap_criterion.py 95.23% <92.85%> (-4.77%) :arrow_down:
src/estimagic/optimization/tranquilo/models.py 94.32% <93.87%> (-1.76%) :arrow_down:
...c/optimization/tranquilo/acceptance_sample_size.py 95.45% <95.45%> (ø)
... and 29 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.