ziatdinovmax / gpax

Gaussian Processes for Experimental Sciences
http://gpax.rtfd.io
MIT License
202 stars 25 forks source link

Feature: implement simulated campaigning for "hyper parameter tuning" #33

Open matthewcarbone opened 1 year ago

matthewcarbone commented 1 year ago

@ziatdinovmax as we discussed, I plan on implementing a simulated campaigning loop for tuning the "hyper parameters" of an optimization loop. I first want to learn this library inside and out so it might take some time. But anyways, the executive summary of the tasks at hand look something like this:

ziatdinovmax commented 1 year ago

@matthewcarbone - sounds great, and I will be happy to help.

matthewcarbone commented 1 year ago

@ziatdinovmax Quick update: I have not forgotten about this. I have funding starting in October and I'll be building on this.

Btw, unrelated question (we can open a new issue if you want), but can gpax do batch sampling? I.e. instead of sequential experiments ("given the data at hand, find me the next experiment to maximize the acquisition function"), can gpax do "given the data at hand, find me the next q experiments that jointly maximize the acquisition function"?

ziatdinovmax commented 1 year ago

@matthewcarbone - Thanks for the update. Yes, there is a batch-mode acquisition: https://github.com/ziatdinovmax/gpax/blob/main/gpax/acquisition/batch_acquisition.py

ziatdinovmax commented 11 months ago

On the "parallelize the campaigning": assuming this is a single program that runs with different input parameters, can this be done with JAX built-in tools for parallel evaluation?

matthewcarbone commented 11 months ago

It's funny I was thinking something similar, but I don't quite know how to do this. The tough part is that it's a combination of the continuous and bandit optimization. There's almost like a tree of decisions. For example, do you choose EI or UCB? If you choose UCB, you also need to choose beta. How does one go about optimizing over that space? I know it's possible, but I'm not sure how to implement it.

Btw, I also have concerns about speed. Fitting gpax to ~400 5-dimensional data points took quite a few minutes. I realize that's a lot of data, but I noticed that other codes are much faster. Is there any way we can speed up mcmc.run?

ziatdinovmax commented 11 months ago

One can use stochastic variational inference GP (viGP) or deep kernel learning (viDKL) for large datasets and high dimensions. The mcmc (or, more precisely, HMC with NUTS sampler) implementation is already dramatically faster than what pymc or pyro packages offer. I generally recommend it in situations where specific physics-based priors are available or one wants a detailed analysis of posterior distributions.

matthewcarbone commented 11 months ago

Is there a way to do this already in GPax?

Edit: whoops please disregard. 😁