Use stochastic root-finding algorithm

kalekundert commented 2 years ago

After implementing this project, I realized that I have a root-finding problem and not an optimization problem. Briefly, my goal is for a given quantile of the atom displacement distribution to be equal to a given value. I expect the quantile to increase monotonically as the restraint weight decreases, and to at some point cross the desired value.

Currently I'm using a stochastic optimization algorithm (scikit-optimize). To turn the zero into a minimum, I take the absolute value of the objective function. However, I think this is a bad idea for a few reasons:

Taking the absolute value loses information (i.e. the sign) which could be used to help the algorithm find the root.
Taking the absolute value gives the objective function a more complicated shape (i.e. curved instead of increasing monotonically). Presumably this makes the function harder to learn, and therefore requires more evaluations to predict accurately.

I looked into this briefly. This stack overflow post has two relevant references. The first is a review of stochastic root-finding algorithms [Pasupathy2011]. The second is a dissertation (which appears to be published later [Frazier2016]) on one particular algorithm [Waeber2013]. I don't think I'm going to implement either of these algorithms now, because I already have a reasonable result and I need to move on, but in the future I think it would be good to use an algorithm designed for the specific task I have.

Note also that I have a 1D root-finding problem, which is definitely simpler than higher-dimensional versions of the problem. Ideally, I'd find an algorithm tuned to the 1D case.

kalekundert commented 2 years ago

Here's an implementation of the probabilistic bisection algorithm (PBA):

https://gist.github.com/fgregg/54ee54b90b8a4fb938fc9b008ec95bd6

This algorithm was first described by Horstein in 1963 and is the focus of [Waeber2013]. It doesn't seem to be mentioned by [Pasupathy2011]. Note however that PBA (and the implementation linked above) requires that the probability of a correct response from the function (i.e. in terms of being on the right side of the root) is fixed and known. I don't think this information is available to me, though.

kalekundert commented 2 years ago

Noisyopt contains a simple stochastic bisection algorithm. It basically just evaluates the function a pre-determined number of times at each point, and uses the average to decide which direction to continue bisecting. This doesn't seem to account for uncertainty in a very sophisticated way, e.g. [Waeber2013] building up a posterior distribution of the location of the root.

kalekundert commented 2 years ago

I spent time today reading about three kinds of algorithms. Here are my takeaways:

Stochastic approximation (SA):
- Seems to be the fastest way to find the root.
- Doesn't provide any information on how accurate the root is.
Surrogate model, e.g. Gaussian process (GP)
- Can use deterministic root finding algorithms
- Most surrogate models are not guaranteed to have only 1 root
- The distribution of the root estimate depends strongly on the smoothness of the model.
Probabilistic bisection algorithm (PBA):
- Provides error estimate/confidence interval
- Learning p(x) becomes asymptotically unfeasible as x→x*
- x*: the true root
- x: the point currently being sampled
- p(x): the probability that x > x*

I really value knowing how accurate the root is, so I'm most drawn to PBA. But none of these algorithms seem to have implementations available in python. I might be able to request implementations from the authors, but it would probably be best to just wait and see what becomes available over time.

kalekundert / rosetta_relax_b

Use stochastic root-finding algorithm #1