pytorch / botorch

Bayesian optimization in PyTorch
https://botorch.org/
MIT License
3.07k stars 393 forks source link

[Feature Request] Safe optimization #2238

Closed roussel-ryan closed 7 months ago

roussel-ryan commented 7 months ago

šŸš€ Feature Request

I'm looking to perform constrained optimization is a "safer" context where frequent violations of the constraint cannot be tolerated. To do this, I would like to be able to specify a constraint on optimizing the acquisition function where points are only proposed if the constraint GP model predicts a X% probability of satisfying the constraint (for example, 90% chance of satisfying the constraint). Any ideas on how to best implement this?

Motivation

Is your feature request related to a problem? Please describe. We are attempting to perform BO in particle accelerator control problems where violating the constraint can potentially lead to equipment damage so we need to be 50-99% confident that constraints are satisfied (while being able to trade safety for optimization speed in non-critical situations).

This would be an alternative to the SafeOpt approach described here https://arxiv.org/abs/1902.03229, which limits the acquisition function domain along a 1D subspace.

Pitch

Describe the solution you'd like It seems like the best way to do this is to add a constraint to the optimize_acqf with a helper method that defines the constraining function based on a calculation of the probability of violating the constraint.

Describe alternatives you've considered None

Are you willing to open a pull request? (See CONTRIBUTING) Yes, just looking for some guidance.

Balandat commented 7 months ago

I take it in your setting the value of the constraint function is observable? I.e. it's something like g(x) \leq 0 where g(x) is black box but can be measured (rather than only observing the binary outcome 1{g(x) \leq 0})?

It seems like the best way to do this is to add a constraint to the optimize_acqf with a helper method that defines the constraining function based on a calculation of the probability of violating the constraint.

Yeah that's the obvious thing to try. You could use nonlinear_inequality_constraints and pass in a function h of the form h(x) = gamma - P(w(x) > 0), where gamma is the probability of constraint violation you're willing to tolerate and w is a GP model of g (conveniently P(w(x) > 0) is something you could just compute analytically from the posterior of w at x). That seems like a fine approach (though applying the constraint in log scale probably makes sense numerically, i.e. h(x) = log(gamma) - logcdf(w(x))(0)). I don't have a lot of experience with using this setup myself, but it seems like the natural thing to try. One thing that this acquisition function doesn't have though is any component that would incentivize it to improve and reduce the uncertainty in the model w - which seems like it could very well lead to some pathologies, but I haven't thought too much about this.

One thing to note is that scipy's default nonlinear optimization algorithms aren't that great and this can get pretty slow, especially if the constraint function is complicated and there are a lot of observed data points. The other things is that - depending on the problem - the feasible region may be pretty nasty and potentially disconnected. So finding good feasible initial condition would be important. Also make sure to set the batch_limit to 1 as discussed in the docstring: https://github.com/pytorch/botorch/blob/520468e8d803a3b9ff8eb42c572054e097e4ce78/botorch/optim/optimize.py#L471-L485. If the nonlinear numerical solver is the bottleneck/issue, you could also try using some other options such as (cy)ipopt - this is something that would be cool to see and something I'd love to support more generally.

Also, there is this recent relevant paper (which I haven't read in detail yet, but @jduerholt may have some thoughts on :)) and references therein: https://arxiv.org/abs/2402.07692

jduerholt commented 7 months ago

A few thoughts from my side:

I take it in your setting the value of the constraint function is observable? I.e. it's something like g(x) \leq 0 where g(x) is black box but can be measured (rather than only observing the binary outcome 1{g(x) \leq 0})?

In case that the black box can be measured, the standard approach in botorch would be to have it as an output constraint in which the \leq is approximated by a sigmoid. Let's say that in reality your constraint is fulfilled if g(x) < t, but to be more safe you could setup the output constraint such as it holds g(x) < t - f where f is a positive float and serves as kind of an additional security margin. The larger f is, the higher the probability will be that your constraint is fulfilled. This would be an ad-hoc solution using the standard way of handling black box constraints in botorch. This amount to horizontal translation of the sigmoid in the direction of a higher probability threshold in terms of the orignal one centered at t.

In the paper mentioned by @Balandat we tackled the problem of a non measurable black box constraint by setting them up as a nonlinear_inequality_constraint in principle in the same way as descibed by Max. By modifying gamma you can make it more safe. We also found out that having in this setup good feasible initial conditions is key to success.

Regarding cyipopt: this is also on my long list for things that I want to try in botorch.

Balandat commented 7 months ago

In case that the black box can be measured, the standard approach in botorch would be to have it as an output constraint in which the \leq is approximated by a sigmoid

That is indeed the general approach; one thing that it doesn't specifically do though is keep the probability of violation during the exploration controlled in a specific way. This is b/c the target is a probability-weighted acquisition function; e.g. in the case of EI you'd optimize EI(f(x)) * P(g(x) \leq 0), so if EI(x) is very large for some x it might be that P(g(x) \leq 0) is actually rather small. This is generally fine in cases if violating the constraint during the optimization isn't an issue, but if it is then the explicit constraint on that probability in the setup that I suggested above makes sense. You can also think of being more conservative initially until you learn more about the system, i.e. start off with smaller values of gamma until you collected more data and are more confident about your model (though fundamentally there will always be a tension in that staying away from a constraint boundary will reduce the value of those observations for learning about what that boundary is).

roussel-ryan commented 7 months ago

Thanks to both of you for the discussion of this problem. Its correct that the black box constraint is an observable. I'll take a look at the different methods you discussed and see which would work best for our use case. While the ad hoc method of adding an offset to the output constraint is appealing due to its simplicity, it would be nice to specify the trade-off between exploration/optimization and constraint violations more rigorously.

Balandat commented 7 months ago

I'm going to convert this into a discussion item