Closed xinleipan closed 4 years ago
@xinleipan Thanks for your interest! You might find this tutorial useful. In the cell 6, after calling generate_initial_data
to get random samples, it run N_BATCH
(i.e. cycles) that each cycle fits the model, optimizes and gets new observation, and updates the data.
Please let me know if you have more question about this.
Hello @qingfeng10 , thanks for your suggestion! The problem with that code script is that the memory usage tends to increase over time if you want to train it for a large number of iterations. Is there any chance that it can use a constant amount of memory?
How many iterations are we talking here? The model will use all the collected observations, so naturally the model size will grow. Space complexity is primarily driven by the size of the kernel matrix, which is O(N^2)
with N
the number of observations.
It's possible to bring that down by using scalable GP techniques that use interpolation or variational inference, but we don't have simple out-of-the box support for these (we should write a couple of tutorials / models though).
@Balandat Thanks for the reply! I am aiming for more than several hundreds of iterations for high dimensional optimization (hundreds of dimensions). I think there might be a way to separate the data into different mini batches when training the model (such as fit a gaussian likelihood model) and you don't need to put all data in memory (the data may be saved locally and only load the data when needed). This way the memory usage can be reduced but I'm not sure if botorch allows this customization
Also, is it possible to run this on multiple GPUs? Does data parallel apply to this package?
I think there might be a way to separate the data into different mini batches when training
To do this you'll need to use stochastic variational inference, e.g. a GP model as in https://github.com/cornellius-gp/gpytorch/blob/master/examples/04_Variational_and_Approximate_GPs/SVGP_Regression_CUDA.ipynb. We don't have one packaged with botorch since the uncertainty quantification in these models is not always great and can cause issues in a BayesOpt setting. But you can hook this into the botorch Model API quite easily by subclassing the model from GPyTorchModel
and running your own fitting loop.
I am aiming for more than several hundreds of iterations for high dimensional optimization (hundreds of dimensions)
This dimensionality is very a challenging setting for standard GP models (including the SVGP above, which works well if the number of training points is large, but doesn't scale equally well with the dimension). You may want to look either at semi-local approaches a la https://arxiv.org/abs/1910.01739 (we have an implementation that we need to clean up and make a PR for at some point), or at dimensionality reduction techniques such as random embeddings (http://proceedings.mlr.press/v97/nayebi19a/nayebi19a.pdf and references therein).
Also, is it possible to run this on multiple GPUs? Does data parallel apply to this package?
You can take a look at https://github.com/cornellius-gp/gpytorch/blob/master/examples/02_Scalable_Exact_GPs/Simple_MultiGPU_GP_Regression.ipynb
Thanks @Balandat I think it might also make sense if you add some exploration when sampling new data. From the tutorial you give in https://github.com/pytorch/botorch/blob/master/tutorials/closed_loop_botorch_only.ipynb it seems like there is no exploration for the BO with qEI? though there is some exploration happening with qNoisyEI but I think it was intended for coping with observation noise?
Not sure what you mean, qEI itself will be encouraging exploration in areas with high posterior uncertainty. There is no explicit mechanistic exploration, e.g. by throwing in random evaluations, so this will depend on having a reasonable model.
I see. I'm experimenting with qUCB though, is there a parameter I can change to trade off exploration VS exploitation?
For qEI you mean? If you artificially lower your best_f
, this will result in more exploration.
Thanks! That's good to know what about for qUCB? I thought change beta would change the performance but it didn't. Since the problem is high dimensional I also increase the number of restart point and raw samples for optimize_acqf function. But it seems this is not super helpful
In the tutorials you give in this repo, many of them only involve one cycle of BO: sample some data and then optimize the acquisition function and then get the best point. But in reality, BO should involve multiple cycles: sample data and optimize the acquisition function, and then get new suggested data and obtain values for those suggested candidates from the ground truth black-box function. Could you provide some examples of these? Thanks.