pytorch / botorch

Bayesian optimization in PyTorch
https://botorch.org/
MIT License
3.05k stars 390 forks source link

[Feature Request] Batch method for non-analytic aquisition functions w.r.t to fully-Bayesian GPs #1892

Open fusionby2030 opened 1 year ago

fusionby2030 commented 1 year ago

🚀 Feature Request: Batch method for non-analytic acquisition functions when using fully-Bayesian GPRs

A batch method for non-analytic acquisition functions when using fully-Bayesian treated GPRs.

Motivation

In the fully-Bayesian setting when fitting a GPR, one marginalizes out the hyperparameters instead of point-wise maximization of the (log) likelihood (MAP or MLE). The acquisition function is then averaged by the posterior probability of GP hyperparameters [See Section 2.2 of De Ath et al., 2021]. As far as I can tell, the analytic acquisition functions in Botorch are out-of-the box enabled for this (through the batch mode). I have tested this using the SaasFullyBayesianSingleTaskGP and UCB acquisition function.

Understandably, for many non-analytic acquisition functions this has not been implemented yet (i.e., batch mode).

For non-analytic aquisition functions, batch mode (w.r.t fully-bayesian models) is not so straightforward to me in terms of memory usage. For example, if I have a fully-bayesian model with 8000 draws of the posterior (2000 draws from 4 chains), and I try to use a MC Sampler on top, I believe I would very quickly run out of memory. Additionally, in the current implementation of the MaxValueEntropy, it seems that the posterior is calculated many times, inducing high memory overhead for large number of draws.

Describe the solution you'd like

A batch acquisition implementation for MaxValueEntropy (or other non-analytic acq funcs) for use with fully-bayesian models.

Describe alternatives you've considered

A very crude implementation I orchestrated is to loop through the posterior draws of the GP and calculate the MVE for each draw. However, this of course does not scale well at all (for more draws from posterior).

Are you willing to open a pull request?

I would like to help, but the MVE class is fairly arcane to me at the moment. As I step through the code I will try to update this issue, but am eager to hear if anyone has any ideas or faced similar problems.

Balandat commented 1 year ago

Understandably, for many non-analytic acquisition functions this has not been implemented yet (i.e., batch mode).

So we do support batch MC acquisition functions in conjunction with a "fully Bayesian model" (as you have noticed, the way we have implemented those is by drawing a relatively small number of samples from the HP posterior by running the MCMC chain and then consider those as a "batched" model). See e.g. the SAASBO tutorial where we use qExpectedImprovement in conjunction with a SaasFullyBayesianSingleTaskGP. While this results in much higher memory usage than for a MLE estimated model, this usually works ok if the number of hyperparameter samples is small.

I think the challenge with doing the same with the Max-Value Entropy method(s) is that we currently by default use discrete sampling to sample the max value. This requires computing joint posterior distributions across many points (the num_samples discrete points here https://github.com/pytorch/botorch/blob/main/botorch/acquisition/max_value_entropy_search.py#L255-L297), which can blow up the memory use for each of the hyperparameter samples. Note though that using the Gumbel softmax trick (use_gumbel=True) should avoid computing those joint distributions; so that's something to try (again with a reasonalby small number of hyperparameter samples).

cc @dme65, @saitcakmak for additional insights.