probabilistic-numerics / probnum

Probabilistic Numerics in Python.
http://probnum.org
MIT License
438 stars 57 forks source link

BQ for estimating the expectation of a function estimated by a GP #854

Closed neildhir closed 8 months ago

neildhir commented 8 months ago

Is your feature request related to a problem? Please describe.

This is not quite a feature request (nor a bug report), but more of a request for information of how to perform BQ, using probnum to estimate an expectation of a function estimate i.e. $\mathbb{E}[\hat{f}(x)]$.

Give an example use case.

Using a Gaussian process we can estimate the relationship between variables $x$ and $y$ and then use BQ to get the expectation of the estimate at some test value.

We are interested in:

$\mathbb{E} [ \hat{f}(x) ] = \int \hat{f}(x) p(x) \text{d}x$

The relationship between $x$ and $y$ is estimated using a GP yielding estimator $\hat{f}$. Where the likelihood of a new test point is then modelled by a 1D gaussian density with parameters $\hat{\mu}, \hat{\Sigma}$ - fitted to samples from $f$.

MWE

from botorch.models import SingleTaskGP # wraps a GPyTorch ExactGP
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from gpytorch.mlls import ExactMarginalLogLikelihood
from torch import ones, linspace, no_grad
from botorch.fit import fit_gpytorch_mll
import numpy as np

# Function generates noisy samples from the linear relationship x = y, it then standardised and normalises these samples for downstream usage in a GPyTorch GP
def sample_f(n, noise_std):
    x = np.linspace(0, 10, n)
    noise = np.random.normal(0, noise_std, n)
    y = x + noise

    # Scale the data to the unit cube
    scaler1 = MinMaxScaler()
    scaler2 = StandardScaler()
    x_scaled = scaler1.fit_transform(x.reshape(-1, 1))
    y_scaled = scaler2.fit_transform(y.reshape(-1, 1))

    return tensor(x_scaled), tensor(y_scaled)

def get_fitted_model(train_X:tensor, train_Y:tensor) -> SingleTaskGP:
    model = SingleTaskGP(train_X=train_X, train_Y=train_Y)
    mll = ExactMarginalLogLikelihood(likelihood=model.likelihood, model=model)
    fit_gpytorch_mll(mll)
    return model

x, y = sample_f(10, 1)
model = get_fitted_model(x, y)

def my_fun(x):
    with no_grad():
        f_preds = model(x)
        return f_preds.mean * model.likelihood(f_preds.mean) # integrand

# BQ
measure = LebesgueMeasure(input_dim=1, domain=(0, 1))
qp = QuadratureProblem(fun=my_fun, measure=measure, solution=None)

Describe the solution you'd like. Explanation or correction whether or not this is the correct way to approach BQ for this problem setting.

Presently qp.solution is empty. Consequently I believe I have done something wrong or misunderstood BQ for expectation estimation.

Additional context A plot of some sample data, true function and GP approximation of the true function.

import matplotlib.pyplot as plt
plt.scatter(x, y, color='black', label='noisy samples from true model')
plt.plot(x, x, color='red', label='y=x (true unscaled function)')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Plot of (x, y) pairs')

# Plot the GP model
x_pred = linspace(0, 1, 100)
with no_grad():
    y_pred = model.posterior(x_pred.unsqueeze(1)).mean
plt.plot(x_pred, y_pred, color='blue', label='GP model')

plt.legend()
plt.show()
Screenshot 2024-02-21 at 12 46 48
mmahsereci commented 8 months ago

Hi! To integrate a function when you have a dataset $x$, $y$ (as in your case) you can use the bayesquad_from_data solver. I modified your code slightly. The Gaussian process is built implicitly by the solver (you kind of don't get to see it other than defining the kernel for it). I hope this makes things clearer.

Please beware though that bayesquad_from_data does NOT support hyperparmater tuning of the kernel yet. Hence, you need to make sure to set an appropriate lengthscale for the kernel. Otherwise the result may be off.

The class QuadratureProblem defines the integration problem only (but does not solve it). It's a convenient way to standardize requirements. But when you use the bayesquad_from_data solver, you do not need it :)

import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler

from probnum.quad import bayesquad_from_data
from probnum.randprocs.kernels import ExpQuad

def sample_f(n, noise_std):
    x = np.linspace(0, 10, n)
    noise = np.random.normal(0, noise_std, n)
    y = x + noise

    # Scale the data to the unit cube
    scaler1 = MinMaxScaler()
    scaler2 = StandardScaler()
    x_scaled = scaler1.fit_transform(x.reshape(-1, 1))
    y_scaled = scaler2.fit_transform(y.reshape(-1, 1))[:, 0]

    return x_scaled, y_scaled

# get the data
noise_std = 1
x, y = sample_f(10, noise_std)

# define kernel: make sure that the lengthscale is correct as ProbNum
# currently does NOT support kernel lengthscale tuning!
kernel = ExpQuad(input_shape=(1,), lengthscales=0.1)

# set noise level
options = {'jitter': noise_std}

# integrate
F, info = bayesquad_from_data(nodes=x, fun_evals=y, domain=(0, 1), kernel=kernel, options=options)

# F ist the return object (Normal distribution over the integral value)
print(F)
print(F.mean, F.std)
print(info)
neildhir commented 8 months ago

Thanks for your quick reply!

That is an excellent convenience function, had not seen that.

That being said I am a bit confused by your re-write. Having looked at your other excellent example from emukit here, I was under the impression that the fun_evals had to be samples of the integrand, i.e. if the expectation is given as:

$\mathbb{E} [ \hat{f}(x) ] = \int \hat{f}(x) p(x) \text{d}x$

then fun_evals should be samples from $\hat{f}(x) p(x)$ (was my understanding).

This is indeed what is happening in the SEIR model analysis with Emukit notebook (linked above). There the integrand (in cell) is given as:

# define the integrand 
f_height_of_peak_weighted = seir.height_of_peak_weighted(meanmax)

But in your re-write of my example:

# get the data
noise_std = 1
x, y = sample_f(10, noise_std)

y is not a sample of the integrand (recall we are looking to use BQ to estimate the expectation of the estimated function $\hat{f}$ which is modelled by a GP) as y is not weighted by any likelihood; y is a noisy and scaled version of the true function value.

In my original version I had the weighting like so:

def my_fun(x):
    with no_grad():
        f_preds = model(x)
        return f_preds.mean * model.likelihood(f_preds.mean) # integrand

but it sounds like with bayesquad_from_data I don't need it? It is very possible that I have misunderstood your re-write too.

mmahsereci commented 8 months ago

Ah, I see what you mean. There may be 2 confusions (I am guessing).

1) Regarding definition of integrand: Let $\int_D f(x)p(x)dx$ be the integration problem. The function $f$ is the integrand and $p(x)$ is the density of the integration measure on the domain $D$ (it's not a likelihood and there is no GP yet). For example for the standard Lebesgue measure on the domain $D=[0, 1]$ it is $p(x)=1$. In the code example sample_f produces noisy evaluations of the integrand $f$, i.e. $y=f(x) + \epsilon$ with $\epsilon\sim\mathcal{N}(0, \sigma^2)$ and $\sigma$=noise_std.

In the Emukit SEIR model example, the density $p(x)$ belongs to a Gamma distribution which is not currently supported by BQ methods (only Lebesgue and Gaussian are). Hence, for the SEIR model the notebook recasts the problem and solves the integration problem $\int_D f(x)p(x)dx = \int \tilde{f}(x)dx$ with $\tilde{f}(x):=f(x)p(x)$ instead. That means, it integrates $\tilde{f}(x)$ against the standard Lebesgue measure. This is a simple analytic trick one can do in case the integration measure is not supported by BQ. However, you need to be a bit cautious for noisy integrands when you apply this `re-weighting trick'.

2) More general point: BQ models the integrand $f$ with a Gaussian process. Then it integrates the GP itself (not just its mean predictor). Hence BQ returns the integrated posterior GP (and not just the integrated posterior mean of the GP). That is why BQ returns a univariate distribution $F$ over the integral value (an not just a point estimate). Now, it is indeed the case that the mean of the distribution $F$ is equal to the integral over the posterior mean of the GP, but this computation happens implicitly inside the BQ method (in this function).

Perhaps it is helpful to construct the BQ method from its constituents (this is actually what happens inside the convenience function bayesquad_from_data here and here). However that class is a bit more general and lacks the convenience wrapper, so it may also be more confusing. The code below does that.

from probnum.quad.solvers import BayesianQuadrature
from probnum.quad.solvers.belief_updates import BQStandardBeliefUpdate
from probnum.quad.solvers.stopping_criteria import ImmediateStop
from probnum.quad.integration_measures import LebesgueMeasure
from probnum.randprocs.kernels import ExpQuad

# get the data
noise_std = 1
x, y = sample_f(10, noise_std)

# define the BQ method
kernel = ExpQuad(input_shape=(1,), lengthscales=0.1)
measure = LebesgueMeasure(domain=(0, 1), input_dim=1)  # this defines p(x) and D
belief_update =  BQStandardBeliefUpdate(jitter=noise_std**2, scale_estimation="mle")

bq = BayesianQuadrature(kernel=kernel, 
                        measure=measure,
                        policy=None,
                        belief_update=belief_update,
                        stopping_criterion=ImmediateStop(),
                        initial_design=None)

# integrate
F, _, info = bq.integrate(fun=None, nodes=x, fun_evals=y, rng=None)
neildhir commented 8 months ago

Ah okay, I see - that does clarify a few things, thanks a lot.

Okay so if I have understood this correctly then, I can get the expectation of my estimated integrand $\hat{f}$ using just, it seems, the setup that you have described (using the constituents was very helpful indeed).

Using corrupted samples from the Hennig1D function, the expected value over the domain $[-3,3]$ is then sought by numerically approximating the definite integral.

from botorch.models import SingleTaskGP
from gpytorch.mlls import ExactMarginalLogLikelihood
from botorch.fit import fit_gpytorch_mll
import numpy as np
from torch import tensor, linspace, no_grad
import matplotlib.pyplot as plt
from probnum.quad.solvers import BayesianQuadrature
from probnum.quad.solvers.belief_updates import BQStandardBeliefUpdate
from probnum.quad.solvers.stopping_criteria import ImmediateStop
from probnum.quad.integration_measures import LebesgueMeasure
from probnum.randprocs.kernels import ExpQuad

f = lambda x: np.exp(-x[:, 0] ** 2 - np.sin(3.0 * x[:, 0]) ** 2) # domain=(-3, 3)

def sample_noisy_f(n, noise_std):
    x = np.linspace(-3, 3, n).reshape(-1, 1)
    noise = np.random.normal(0, noise_std, n)
    return (x, f(x) + noise)

noise_std = 0.01
x,y = sample_noisy_f(20, noise_std)
xx = np.linspace(-3, 3, 100).reshape(-1, 1)

# Fit a GP model just for plotting
def get_fitted_model(train_X:tensor, train_Y:tensor) -> SingleTaskGP:
    model = SingleTaskGP(train_X=train_X, train_Y=train_Y)
    mll = ExactMarginalLogLikelihood(likelihood=model.likelihood, model=model)
    fit_gpytorch_mll(mll) # If training succeeded then model is returned in eval() mode
    return model

model = get_fitted_model(tensor(x.reshape(-1,1)), tensor(y.reshape(-1,1)))

plt.scatter(x, y, color='black', label='Noisy samples')
plt.plot(xx, f(xx), color='red', label='True Hennig1D')
plt.xlabel('x')
plt.ylabel('y')

# Plot the GP model
x_pred = linspace(-3, 3, 100)
with no_grad():
    y_pred = model.posterior(x_pred.unsqueeze(1)).mean
plt.plot(x_pred, y_pred, color='blue', label='GP model')

plt.legend(loc='upper left')
plt.show()
Screenshot 2024-02-23 at 13 28 45
# Define the BQ method
kernel = ExpQuad(input_shape=(1,), lengthscales=0.5) #RBF
# measure = LebesgueMeasure(domain=(0, 1), input_dim=1)  # this defines p(x) and D
measure = LebesgueMeasure(domain=(-3, 3), input_dim=1)  # this defines p(x) and D
belief_update =  BQStandardBeliefUpdate(jitter=noise_std**2, scale_estimation="mle")

bq = BayesianQuadrature(kernel=kernel, 
                        measure=measure,
                        policy=None,
                        belief_update=belief_update,
                        stopping_criterion=ImmediateStop(),
                        initial_design=None)

# integrate
F, _, info = bq.integrate(fun=None, nodes=x, fun_evals=y, rng=None)

print(F)
print(F.mean, F.std)
print(info)
<Normal with shape=(), dtype=float64>
Mean: 1.1229285663389401 and std: 0.009890668221226088
BQIterInfo(iteration=0, nevals=20, has_converged=True)

Which, if my understanding is correct; 1.1229285663389401 represents the numerical approximation to the expectation $\mathbb{E}[\hat{f}]$ and 1.1433287777179366 is the ground-truth (from here).

Finally, do you know if hyper parameter tuning is coming? I think emukit has it? Is it a matter of someone taking the time to port it via a PR? I imagine it's more complicated than that.

mmahsereci commented 8 months ago

Okay so if I have understood this correctly then, I can get the expectation of my estimated integrand using just, it seems, the setup that you have described (using the constituents was very helpful indeed).

Yes. Just to emphasize one more time. If your're simply interested in integrating a function $f(x)$ against a measure $p(x)$. You do not need to construct your own GP with predictive mean $\hat{f}(x)$ as the BQ method does it for you.

Finally, do you know if hyper parameter tuning is coming? I think emukit has it? Is it a matter of someone taking the time to port it via a PR? I imagine it's more complicated than that.

Yes, Emukit works a bit differently. It provides a wrapper interface where you can wrap your GP backend in as Emukit does not contruct the GP model itself (it just integrates the GP). In that sense it has hyperparameter tuning if the GP backend has it. The default wrapper uses GPy backend which provides hyperparam tuning.

For ProbNum the availability of hyper-tuning seems to depend on this PR #581 which seems quite stuck to me. May be helpful to ping the people who work on it to see how things are going.

neildhir commented 8 months ago

Yes. Just to emphasize one more time. If your're simply interested in integrating a function against a measure . You do not need to construct your own GP with predictive mean as the BQ method does it for you.

Ah yes, no fully understood. I just wanted to show where my data comes from. Using a GPR was perhaps a poor choice given the current discussion. But alas, all I have is a dataset $\mathcal{D} = {(x_i,yi)}{i=0}^N$ where the evaluations are noisy and that's all I wanted to demonstrate with the blue curve above.

For ProbNum the availability of hyper-tuning seems to depend on this PR https://github.com/probabilistic-numerics/probnum/pull/581 which seems quite stuck to me. May be helpful to ping the people who work on it to see how things are going.

I'll perhaps start pinging some people then.

Thanks again for your help, very valuable.