pytorch / botorch

Bayesian optimization in PyTorch
https://botorch.org/
MIT License
3.11k stars 406 forks source link

[Bug] Fantasize a SingleTaskGP model with a base_sample of zero results in different posterior mean values wrt the base SingleTaskGP #2611

Closed JuanUngredda closed 3 weeks ago

JuanUngredda commented 3 weeks ago

I'm running the following sanity check. When I fantasize a GP model using a base sample equals to zero I would expect that the mean predictions are the same compared to the base model used to fantasize, as given by the parametrization trick for the predictive posterior mean.

$$ \mu^{n+1}(x) = \mu^n(x) + \tilde\sigma(x, x^{n+1})Z $$

where $ \mu^n(x)$ is the posterior mean given $n$ observations, $\tilde\sigma(x, x^{n+1})$ the "predictive standard deviation" given the conditioning location, and Z a normal distribution. From the formula, if Z generates a value of zero the expression simplifies to be ony the current posterior mean. However, when I test the following code I get some small errors when comparing wrt the current posterior mean

I'm curious to know if this is something expected from the code or maybe there's a problem in the following snippet

To reproduce

Code snippet to reproduce

# This is the code for my own sampler. The only thing it does is return zero samples
class zeroSampler(NormalMCSampler):
    def __init__(self, sample_shape: torch.Size, seed: int | None = None, **kwargs: torch.Any) -> None:
        super().__init__(sample_shape, seed, **kwargs)

    def _construct_base_samples(self, posterior: Posterior) -> None:
        target_shape = self._get_collapsed_shape(posterior=posterior)
        if self.base_samples is None or self.base_samples.shape != target_shape:
            base_collapsed_shape = target_shape[len(self.sample_shape):]
            output_dim = base_collapsed_shape.numel()
            if output_dim > SobolEngine.MAXDIM:
                raise UnsupportedError(
                    "SobolQMCSampler only supports dimensions "
                    f"`q * o <= {SobolEngine.MAXDIM}`. Requested: {output_dim}"
                )
            base_samples = torch.zeros(self.sample_shape.numel())
            base_samples = base_samples.view(target_shape)
            self.register_buffer("base_samples", base_samples)
        self.to(device=posterior.device, dtype=posterior.dtype)

# This is the actual code that is launched
        torch.manual_seed(0)
        dtype = torch.double
        torch.set_default_dtype(dtype)
        d = 1
        num_of_points = 10
        train_X = torch.rand(num_of_points, d, device=self.device, dtype=dtype)
        train_Y_objective = torch.rand(num_of_points, 1, device=self.device, dtype=dtype)
        NOISE = torch.tensor(1e-6, device=self.device, dtype=dtype)
        model_objective = SingleTaskGP(train_X, train_Y_objective,
                                       train_Yvar=NOISE.expand_as(train_Y_objective.reshape(-1, 1)))
        model = ModelListGP(*[model_objective])
        mll = SumMarginalLogLikelihood(model.likelihood, model)
        fit_gpytorch_mll(mll)

        n_test_samples = 1000
        test_X = torch.rand(n_test_samples, d, device=self.device, dtype=dtype)
        posterior1 = model.posterior(test_X)
        posterior_mean1 = posterior1.mean

        conditioning_x = torch.tensor([[0.0]])
        zero_sampler = zeroSampler(sample_shape=torch.Size([1]))
        sampler_list = ListSampler(*[zero_sampler])
        ones = torch.ones((1, 1), dtype=torch.bool)
        fantasized_model = model.fantasize(conditioning_x, sampler=sampler_list, evaluation_mask=ones)
        fantasised_model_posterior = fantasized_model.posterior(test_X)
        fantasised_posterior_mean = fantasised_model_posterior.mean

        diff_model0 = torch.sum(torch.abs(fantasised_posterior_mean[0, :, 0] - posterior_mean1[:, 0]))
        print("diff_model0: " , diff_model0)
        print("OK")

Stack trace/error message

diff_model0:  tensor(2.7756e-16, grad_fn=<SumBackward0>)
OK

Expected Behavior

I would expect the diff_model0 would be equals to zero.

System information

Please complete the following information:

saitcakmak commented 3 weeks ago

Hi @JuanUngredda. The difference of 2e-16 is completely negligible. The updates are computed using floating point operations with limited machine precision, so we can't expect the results to match perfectly to what the theory might suggest.