Non GP model types in Botorch

jduerholt commented 2 years ago

Hi,

I was thinking about the possibility to use non GP models within botorch. For example to use a GP for one objective and a neural network (ensemble) for another one. Using just a neural network should already be possible via the GenericDeterministcModel https://github.com/pytorch/botorch/blob/f8da711049161f3dc238ada2890963b6c6dbb8ee/botorch/models/deterministic.py#L83 via just hooking in a neural network written in torch as callable f.

In this case, uncertainty estimates from a NN ensemble could not be used. My idea was to implement a new type of Posterior that takes also the variance from an NN ensemble and return it as variance of the posterior. https://github.com/pytorch/botorch/blob/f8da711049161f3dc238ada2890963b6c6dbb8ee/botorch/posteriors/posterior.py#L56

This should it already allow to use the whole botorch machinery of analytical acquisition functions. Of course this assumes that the posterior is normally distributed. If one then also implements the rsample method of the posterior, then one should also be able to use the MC acquisiton functions.

Do you see any obstacles in this?

I see the benefit in the possibility of using other model types in situations in which they perform better than GPs and do not have to reimplement the great machinere of acquisition functions and so forth, which are already available in botorch.

Best,

Johannes

wjmaddox commented 2 years ago

Yes, this is pretty possible (I have some research code doing exactly deep ensemble posteriors). In general, the mean and variance can be calculated using sample mean and variance of the network's output producing a normal approximation to the "posterior", while sampling can be done by selecting a random item or items in the list of networks.

The only gotcha is in dealing with multi-batched data as some classes of NNs don't like that in torch (I'm thinking things like RNNs and LSTMs on string inputs for example).

Balandat commented 2 years ago

Yeah I did think about this when designing the APIs, so this should be possible without too much trouble (if not we should fix that). Basically, as long as you can have a posterior object that implements rsample and allows back-propagating gradients through samples that should work. Would love to see more use cases of this, actually.

jduerholt commented 2 years ago

Thanks for the info. I will try to setup a simple MWE for this in the next month. Maybe I will come back with some questions then ;)

jduerholt commented 2 years ago

Hi,

I just tested using an NN ensemble within botorch and it works, both for analytical and as MC acqfs. I just represented the posterior as multivariate normal and used the GPytorchPosterior. Of course this assumes then also in the sampling process a normal distribution but doing it like this treats analytical and MC acqfs on the same footing.

Does this approach makes sense for you?

Best,

Johannes

Balandat commented 2 years ago

Hmm could you elaborate a bit more on what exactly you mean by

I just tested using an NN ensemble within botorch and it works, both for analytical and as MC acqfs. I just represented the posterior as multivariate normal and used the GPytorchPosterior.

Is this using Wesley's approach of using the sample mean and variance from the NN output? If you do this for sampling this seems a bit odd since you use samples from the "true posterior" of the network to fit a MVN and then sample from that. Why not just use the outputs from the NN output directly as "samples"? You could have a lightweight wrapper posterior object that just references the network internally, and where rsample just means computing the NN outputs. Or is the issue here that the NN ensemble is deterministic (conditional on the initially drawn ensemble) so that this sampling distribution would be discretely supported?

jduerholt commented 2 years ago

Yes, I calculate the mean and variance over the prediction of each NN in the ensemble of NNs. With the mean and the variance alone, I can already use analytic ACQFS like EI, of course, this assumes, that the posterior is normal distributed. For being also able to use MC ACQFS, I just used GpytorchPosterior and was parameterizing the underlying mvn with the mean and variance of the ensemble prediction. With this I could use the already existing Posterior implementations. Of couse this also assumes a normally distributed Posterior, but at least I get with this the same ACQF values as with the analytic counterpart. Is it somehow clear what I mean?

But I think, I will also implement your suggestion of sampling the outputs directly. For BNNs one could then do the same.

wjmaddox commented 2 years ago

In case you haven't already implemented it. I've managed to open-source a deep ensemble posterior class here that should be pretty generic and works with batching (the other code inthe file is pretty tightly specified into our research codebase for that paper).

@Balandat I'm happy to try to some variant of this up as a PR as well over the coming weeks if that'd be useful.

jduerholt commented 2 years ago

Thanks for sharing. Looks promising. I also try to add my implementation based on the multivariate normal at some point in the next weeks. Then both options are available.

Balandat commented 2 years ago

I'm happy to try to some variant of this up as a PR as well over the coming weeks if that'd be useful.

@wjmaddox that would be awesome! Did you have an end to end example using this that you can point to?

wjmaddox commented 2 years ago

Yeah, here's roughly the link to the overall model class (https://github.com/samuelstanton/lambo/blob/7b67684b884f75f7007501978c5299514d0efb75/lambo/optimizers/pymoo.py#L343). As I think I mentioned previously, we were using genetic algorithms to optimize everything b/c the tasks we considered were discrete. @samuelstanton can walk you through more of the code if necessary.

It's probably best for us to just pull out a simple notebook outside of our research code.

Balandat commented 2 years ago

we were using genetic algorithms to optimize everything b/c the tasks we considered were discrete

cc @sdaulton this could be a good real-world test case for some of your discrete optimization work.

jduerholt commented 1 year ago

@wjmaddox @samuelstanton: I had a closer look at your implementation of the ensemble posterior. I like it! I would be willing to create a PR based on it to bring it directly into botorch.

I have one question. Maybe @Balandat could also help there:

From what I saw the MC ACQFs in the recent implementations of botorch always use rsample_from_base_samples. How would one implement this method for an ensemble?

In @wjmaddox and @samuelstanton implementation, the rsample method just returns the output of the requested ensemble models, which have been randomly permuted beforehand. If the number of requested samples is larger than the number of models in the ensemble an error is raised.

saitcakmak commented 4 months ago

Resolved by https://github.com/pytorch/botorch/pull/1636

pytorch / botorch

Non GP model types in Botorch #1064