pints-team / pints

Probabilistic Inference on Noisy Time Series
http://pints.readthedocs.io
Other
221 stars 32 forks source link

Add more optimisers #684

Closed MichaelClerx closed 1 month ago

MichaelClerx commented 5 years ago

Creating a big ticket using the list from/for the paper, closing smaller issues.

Ultimately, this should be closed in favour of much more specific tickets ("implement method x")

Optimisers

Semi-organised list of optimisation methods.

Italicised bits are my attempt at a one-line summary

Reference should be given if possible.

Things of interest:

Derivative-free Direct methods (aka Search methods)

Completely ignore the idea of f having a derivative

Brute-force exploration

Look at the landscape and pick the lowest point

Random search

Coordinate search (aka Coordinate descent, aka Compass search)

Evaluate f at nearby points at distance d in each direction, if better go there and end iteration, if no better point found decrease d

Search & poll / Pattern search

Simplex methods

Tabu search

Perform local search (e.g. random or coordinate search), but allow moves to worse f if at an optimum; but remember previously visited points and disallow those (mark them taboo).

Dividing rectangles algorithm (DIRECT)

Partitions the (bounded) subspace into rectangles, and works out where the optimum could be for any value of the Livschitz constant (which says how fast a function varies with its input), then subdivides all rectangles where it could possible be

Powell's conjugate direction method

Multi-start methods

Start from several points to avoid local optima

Multi-level single-linkage (MLSL)

Multi-start from uniform places densely in the search space should find all optima, but is expensive, so use clustering methods to try and identify clusters of starting points with the same attractor

Basin-hopping

Repeatedly perform and then perturb a local search

Derivative-free evolutionary methods and metaheuristics

Maintain some population of points that gets updated at each iteration

Genetic algorithms (GA)

Generate new points using crossover and mutation, estimate the fitness of all points in the population, remove the points with the worst scores

Differential evolution

Loop over each point in a population, generate a new proposal for each point (as in GA), and replace point with that proposal if its f is lower

Evolution strategies

Like GA, but mutate by adding a normally distributed random value to each particle - no crossover

Controlled random search (CRS)

Evolve a random population using a Heuristic that's a bit like a randomised Nelder-Mead iteration

Swarm algorithms

Metaheuristics

Bayesian optimistation

Treat f as an unknown distribution, define a prior over it, generate samples from f, combine with prior to form posterior, repeat.

Gradient-estimating methods

Assume f' exists, and then approximate it

Finite difference methods

Approximate f' with finite differences, and find its root

Simplex gradient methods / Implicit filtering

"Line-search using the simplex gradient"

Natural evolution strategies (NES)

Use a search population to estimate the "natural gradient", a gradient which takes different scaling of the coordinates into account

Surrogate-model methods

Replace f by some function g for which g' is easy to find

Trust-region methods

Select a region of search space, approximate f (typically with a quadratic function), and move to its minimum

Data-based Online Nonlinear Extremumseeker (DONE)

Use Fourier-based model, then optimize on that with L-BFGS

Methods requiring the 1st-order gradient

Root finding methods

Find a point where f'(x) = 0 (and hope there's only one)

Gradient descent (aka Steepest descent)

Continuation

Define a class of functions F(x,k), so that f'(x)=F(x,1), and some solution F(x*, 0) = 0 is known, then 'continue' k to find the x where f'(x)=0

Others

Methods requiring the 2nd-order gradient

?

Good resources for optimisers:

MichaelClerx commented 4 years ago

This one sounds a bit different: https://link.springer.com/article/10.1007%2Fs10957-013-0354-0 But even the authors themselves didn't implement it...

MichaelClerx commented 4 years ago

Here's a book by those authors. Seems mostly classical again, although chapter 2 is on GAs which is encouraging https://www.amazon.com/Derivative-Free-Optimization-Operations-Financial-Engineering/dp/3319689126/ref=sr_1_1?keywords=derivative-free+and+black+box+optimization&qid=1567851546&s=gateway&sr=8-1

MichaelClerx commented 4 years ago

@martinjrobins says one of the global optimisers in scipy is quite good:

"Simplicial homology global optimization", or SHGO

martinjrobins commented 4 years ago

SHGO explanation and pseudocode here:

https://stefan-endres.github.io/shgo/files/shgo_defense.pdf

says convergence to the global minima is guaranteed for Lipschitz smooth functions, looks very mathsy!

MichaelClerx commented 4 years ago

Looks interesting though!

MichaelClerx commented 3 years ago

DIRECT sounds worth a look: https://link.springer.com/article/10.1007/s10898-020-00952-6

MichaelClerx commented 3 years ago

Some papers about global ones: https://link.springer.com/search?query=&search-within=Journal&facet-journal-id=10898

Not many entries in 2021 on our type of problem

MichaelClerx commented 2 years ago

Good place to look: https://numbbo.github.io/data-archive/

In particular,

MichaelClerx commented 2 years ago

PSO with proofs: https://arxiv.org/abs/2205.04880

https://github.com/akashspace/Consensus-based-opmization