pytorch / botorch

Bayesian optimization in PyTorch
https://botorch.org/
MIT License
3.1k stars 404 forks source link

[Bug] Gradient of GPDraw does not behave as expect #2140

Closed sangttruong closed 3 months ago

sangttruong commented 11 months ago

🐛 Bug

I'm trying to maximize f(x) with respect to x in 2D, where f ~ GP using the GPDraw function (i.e., 1 step in the Thompson sampling procedure). The gradient y = f(x) w.r.t x does not point toward the steepest descent direction. In addition, the gradient can sometimes change direction quite unexpectedly. This behavior is not observed when f is Ackley. I wonder if this is a code problem or a (numerical) issue with GP. Any guidance would be appreciated.

To reproduce

Code snippet to reproduce

import torch
import numpy as np
from botorch import fit_gpytorch_model
from botorch.models import SingleTaskGP
from botorch.models.transforms.outcome import Standardize
from gpytorch.mlls import ExactMarginalLogLikelihood
from botorch.utils.gp_sampling import GPDraw
from torch.optim import Adam
import matplotlib.pyplot as plt
from botorch.test_functions.synthetic import Ackley
import random

x_dim = 2
seed = 1442223
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
device = "cuda"
torch_dtype = torch.float32

initial_points = [
    [0.2, 0.7],
    [0.0, -0.4],
    [-0.2, 0.8],
    [-0.5, 0.5],
    [-0.3, 0.0],
    [0.8, -0.1],
    [-0.4, -0.5],
    [0.5, -0.5],
    [-0.7, -0.6],
    [0.45, 0.5]
]
env = Ackley()
data_x = torch.tensor(
    initial_points,
    device=device,
    dtype=torch_dtype,
)
# >>> n_initial_points x dim

data_y = env(data_x).reshape(-1, 1)
# >>> n_initial_points x 1
from gpytorch.kernels import RBFKernel, ScaleKernel

GP = SingleTaskGP(
    data_x,
    data_y,
    outcome_transform=Standardize(1),
).to(device)

mll = ExactMarginalLogLikelihood(GP.likelihood, GP)
fit_gpytorch_model(mll)

# Initialize x with shape (100, 2)
x = torch.rand(100, 2, device=device)*2 -1
x.requires_grad_(True)
optimizer = Adam([x], lr=0.01)

for i in range(1000):
    optimizer.zero_grad()

    f = GPDraw(GP, seed=seed)
    # f = Ackley()
    loss = -f(x).mean()
    loss.backward()
    optimizer.step()
    grad = x.grad.clone()

    # Plotting ###############################################################
    n_space = 100
    fig, ax = plt.subplots(1, 1)
    bounds_plot_x = bounds_plot_y = -1.1, 1.1
    ax.set(xlabel="$x_1$", ylabel="$x_2$", xlim=bounds_plot_x, ylim=bounds_plot_y)
    title = "GPDraw Gradient Test"
    ax.set_title(label=title)

    # Plot function in 2D ####################################################
    X_domain, Y_domain = (-1.1, 1.1), (-1.1, 1.1)
    X, Y = np.linspace(*X_domain, n_space), np.linspace(*Y_domain, n_space)
    X, Y = np.meshgrid(X, Y)
    XY = torch.tensor(np.array([X, Y])).float().to(device)  
    # >> 2 x 100 x 100

    f = GPDraw(GP, seed=seed)
    # f = Ackley()
    Z = f(XY.reshape(2, -1).T).reshape(X.shape).cpu().detach().numpy()
    cs = ax.contourf(X, Y, Z, levels=30, cmap="bwr", alpha=0.7)
    ax.set_aspect(aspect="equal")
    cbar = fig.colorbar(cs)
    cbar.ax.set_ylabel("$f(x)$", rotation=270, labelpad=20)
    ax.scatter(
        x[1, 0].cpu().detach().numpy(),
        x[1, 1].cpu().detach().numpy(),
        label="Data",
        color='red'
    )
    ax.arrow(
        x[1, 0].cpu().detach().numpy(),
        x[1, 1].cpu().detach().numpy(),
        grad[1, 0].cpu().detach().numpy(),
        grad[1, 1].cpu().detach().numpy(),
        head_width=0.05,
        head_length=0.1,
        fc='blue', 
        ec='blue'
    )

    ax.legend()
    plt.show()

Stack trace/error message The gradient does not point downhill. It changes the direction and magnitude quite rapidly. The particle does not seem to move with a gradient.

Screenshot 2023-12-06 at 11 48 26 Screenshot 2023-12-06 at 11 48 17 Screenshot 2023-12-06 at 11 48 08

Expected Behavior

The gradient should point downhill. Below is the expected behavior when using the Ackley function. The particle moves uphill nicely with Adam, and the gradient behaves as expected.

Screenshot 2023-12-06 at 11 49 28

System information

Please complete the following information:

Additional context

None

Balandat commented 11 months ago

Thanks for flagging this. We'll have to look into this in a bit more detail. cc @SebastianAment for potential numerical issues with gradient computations.

That said, is there a specific reason you are using the GPDraw class? This is a bit of a poor man's approach to drawing sample paths from GPs, we have a much better setup based on path-wise sampling: https://github.com/pytorch/botorch/blob/main/botorch/sampling/pathwise/posterior_samplers.py#L86-L107

It should generally be preferable to use that - please let us know if you run into any similar issues there.

saitcakmak commented 3 months ago

Hi @sangttruong. We're deprecating the GPDraw class and it will be removed in a future release. I'd also recommend following @Balandat's recommendation to use pathwise sampling instead. Closing this since we do not intend to investigate the issue further due to deprecation.