pytorch / botorch

Bayesian optimization in PyTorch
https://botorch.org/
MIT License
3.11k stars 404 forks source link

[Feature Request]Tutorials that integrate the full cycle of BO #465

Closed xinleipan closed 4 years ago

xinleipan commented 4 years ago

In the tutorials you give in this repo, many of them only involve one cycle of BO: sample some data and then optimize the acquisition function and then get the best point. But in reality, BO should involve multiple cycles: sample data and optimize the acquisition function, and then get new suggested data and obtain values for those suggested candidates from the ground truth black-box function. Could you provide some examples of these? Thanks.

qingfeng10 commented 4 years ago

@xinleipan Thanks for your interest! You might find this tutorial useful. In the cell 6, after calling generate_initial_data to get random samples, it run N_BATCH (i.e. cycles) that each cycle fits the model, optimizes and gets new observation, and updates the data.

Please let me know if you have more question about this.

xinleipan commented 4 years ago

Hello @qingfeng10 , thanks for your suggestion! The problem with that code script is that the memory usage tends to increase over time if you want to train it for a large number of iterations. Is there any chance that it can use a constant amount of memory?

Balandat commented 4 years ago

How many iterations are we talking here? The model will use all the collected observations, so naturally the model size will grow. Space complexity is primarily driven by the size of the kernel matrix, which is O(N^2) with N the number of observations.

Balandat commented 4 years ago

It's possible to bring that down by using scalable GP techniques that use interpolation or variational inference, but we don't have simple out-of-the box support for these (we should write a couple of tutorials / models though).

xinleipan commented 4 years ago

@Balandat Thanks for the reply! I am aiming for more than several hundreds of iterations for high dimensional optimization (hundreds of dimensions). I think there might be a way to separate the data into different mini batches when training the model (such as fit a gaussian likelihood model) and you don't need to put all data in memory (the data may be saved locally and only load the data when needed). This way the memory usage can be reduced but I'm not sure if botorch allows this customization

xinleipan commented 4 years ago

Also, is it possible to run this on multiple GPUs? Does data parallel apply to this package?

Balandat commented 4 years ago

I think there might be a way to separate the data into different mini batches when training

To do this you'll need to use stochastic variational inference, e.g. a GP model as in https://github.com/cornellius-gp/gpytorch/blob/master/examples/04_Variational_and_Approximate_GPs/SVGP_Regression_CUDA.ipynb. We don't have one packaged with botorch since the uncertainty quantification in these models is not always great and can cause issues in a BayesOpt setting. But you can hook this into the botorch Model API quite easily by subclassing the model from GPyTorchModel and running your own fitting loop.

I am aiming for more than several hundreds of iterations for high dimensional optimization (hundreds of dimensions)

This dimensionality is very a challenging setting for standard GP models (including the SVGP above, which works well if the number of training points is large, but doesn't scale equally well with the dimension). You may want to look either at semi-local approaches a la https://arxiv.org/abs/1910.01739 (we have an implementation that we need to clean up and make a PR for at some point), or at dimensionality reduction techniques such as random embeddings (http://proceedings.mlr.press/v97/nayebi19a/nayebi19a.pdf and references therein).

Also, is it possible to run this on multiple GPUs? Does data parallel apply to this package?

You can take a look at https://github.com/cornellius-gp/gpytorch/blob/master/examples/02_Scalable_Exact_GPs/Simple_MultiGPU_GP_Regression.ipynb

xinleipan commented 4 years ago

Thanks @Balandat I think it might also make sense if you add some exploration when sampling new data. From the tutorial you give in https://github.com/pytorch/botorch/blob/master/tutorials/closed_loop_botorch_only.ipynb it seems like there is no exploration for the BO with qEI? though there is some exploration happening with qNoisyEI but I think it was intended for coping with observation noise?

Balandat commented 4 years ago

Not sure what you mean, qEI itself will be encouraging exploration in areas with high posterior uncertainty. There is no explicit mechanistic exploration, e.g. by throwing in random evaluations, so this will depend on having a reasonable model.

xinleipan commented 4 years ago

I see. I'm experimenting with qUCB though, is there a parameter I can change to trade off exploration VS exploitation?

Balandat commented 4 years ago

For qEI you mean? If you artificially lower your best_f, this will result in more exploration.

xinleipan commented 4 years ago

Thanks! That's good to know what about for qUCB? I thought change beta would change the performance but it didn't. Since the problem is high dimensional I also increase the number of restart point and raw samples for optimize_acqf function. But it seems this is not super helpful