pytorch / botorch

Bayesian optimization in PyTorch
https://botorch.org/
MIT License
3.07k stars 393 forks source link

[Bug] Inconsistent Gradient Values for N-Dimensional Integral of Multidimensional Normal Distribution #2411

Open simplifier89 opened 3 months ago

simplifier89 commented 3 months ago

🐛 Bug

Description: I am using gradients to calculate the N-dimensional integral of a Multidimensional Normal Distribution. However, I observed that the gradient values along dimensions are different, which seems strange. Below are the steps and code snippets demonstrating the issue.

Steps to Reproduce:

Compute the integral and gradients using the provided code. Compare the results with finite difference approximations.

Code snippet to reproduce

from botorch.utils.probability.bvn import bvnu
from botorch.utils.probability import bvn, MVNXPB
import torch

torch.set_printoptions(precision=7)
dtype = torch.float64

N_dim = 6
up_limit = torch.ones(N_dim, dtype=dtype)
up_limit[0] = torch.tensor([1.], dtype=dtype)
up_limit.requires_grad_()
integration_limits = torch.tensor([[-torch.inf, -torch.inf]] * N_dim)
integration_limits[...,-1] = up_limit
cov_matrix = (0.5 * torch.ones((N_dim, N_dim), dtype=dtype)).fill_diagonal_(1.)
mvn_pb = MVNXPB(cov_matrix, integration_limits)
mvn_pb.solve()
pb_result = mvn_pb.log_prob.exp()
print(f'pb_result: {pb_result}')
delta_limit = torch.autograd.grad(pb_result, up_limit, create_graph=False, retain_graph=False)
print(f'delta_integral {delta_limit}')

When I calculate the same derivatives with finite differences, all values are the same as expected. We can see that the finite differences values are close to the last two values of delta_limit.

Additionally, if the cov_matrix is an identity matrix, we do not observe this behavior. The gradient values are the same along the dimensions and are close to the finite difference values.

finite_differences = []
for i in range(integration_limits[...,-1].shape[-1]):
    integration_limits[...,-1][i] = 1.0001
    mvn_pb = MVNXPB(cov_matrix, integration_limits)
    mvn_pb.solve()
    pb_result_2 = mvn_pb.log_prob.exp()
    finite_differences.append((pb_result_2 - pb_result) / .0001)
    integration_limits[...,-1][i] = 1.
print(f'finite_differences {finite_differences}')

Stack trace/error message

delta_integral (tensor([0.0930755, 0.0930755, 0.0848957, 0.0848957, 0.0782828, 0.0782828],
       dtype=torch.float64),)

Expected Behavior

finite_differences [tensor(0.0782873, dtype=torch.float64, grad_fn=<DivBackward0>), tensor(0.0782873, dtype=torch.float64, grad_fn=<DivBackward0>), tensor(0.0782873, dtype=torch.float64, grad_fn=<DivBackward0>), tensor(0.0782873, dtype=torch.float64, grad_fn=<DivBackward0>), tensor(0.0782873, dtype=torch.float64, grad_fn=<DivBackward0>), tensor(0.0782873, dtype=torch.float64, grad_fn=<DivBackward0>)] 

System information

Please complete the following information:

Question: Is this a bug or is there an explanation for this behavior?

Balandat commented 3 months ago

Hmm, interesting, thanks for flagging this. I'm not entirely sure what causes this, but it's got to be something in the implementation-weeds. One thing to note is that MVNXPB is approximates the integral, so the gradients are also approximate - depending on the order the bivariate conditioning goes through the dimensions I can see those errors being different for different dimensions.

I would assume that the errors here would grow larger as as the covariance gets more and more singular, I'm curious if you've observed that in practice. E.g. what happens to the size of the errors as you let diag_val in cov_matrix = (0.5 * torch.ones((N_dim, N_dim), dtype=dtype)).fill_diagonal_(diag_val) approach 0.5?