Error when using FixedFeatureAcquisitionFunction and project with qMultifidelityKnowledgeGradient

r-ashwin commented 3 years ago

Related tp #625 but opening a new issue since that one was closed.

versions torch/gpytorch/botorch 1.7.0-cpu/1.3.0/0.3.3

Reproducible example

import math
import torch

import botorch
botorch.__version__
from botorch.fit import fit_gpytorch_model
from botorch.models import SingleTaskGP
from botorch.utils import standardize
from gpytorch.mlls import ExactMarginalLogLikelihood

d = 3
bounds  = torch.stack([torch.zeros(d), torch.ones(d)])

train_X = bounds[0] + (bounds[1] - bounds[0]) * torch.rand(15, d)
train_Y = torch.sin(2 * math.pi * train_X[:, [0]]) 
# train_Y = standardize(train_Y + 0.0 * torch.randn_like(train_Y))

model   = SingleTaskGP(train_X, train_Y)
mll     = ExactMarginalLogLikelihood(model.likelihood, model)
fit_gpytorch_model(mll);

def project(x):
    return torch.cat([x[..., :-1], torch.full_like(x[..., -1:], .5)], dim=-1)

from botorch.acquisition import qKnowledgeGradient, qMultiFidelityKnowledgeGradient
torch.manual_seed(123)
num_fantasies = 64
qKG           = qMultiFidelityKnowledgeGradient(model, num_fantasies=num_fantasies, project=project)

from botorch.optim import optimize_acqf
from botorch.utils.sampling import manual_seed

with manual_seed(1234):
    candidates, acq_value = optimize_acqf(
        acq_function=qKG, 
        bounds=bounds,
        q=1,
        num_restarts=5,
        raw_samples=25,
        fixed_features={d-1:1.}
    )

from botorch.acquisition import FixedFeatureAcquisitionFunction
columns = [2]
values  = [2.]
qKG_FF = FixedFeatureAcquisitionFunction(qKG, 3, columns, values)

from botorch.optim import optimize_acqf
from botorch.utils.sampling import manual_seed

ff_bounds = bounds[...,:-1].reshape(2,-1)
with manual_seed(1234):
    candidates, acq_value = optimize_acqf(
        acq_function=qKG_FF, 
        bounds=ff_bounds,
        q=1,
        num_restarts=5,
        raw_samples=25,
    )

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-2031d22c553b> in <module>
      9         q=1,
     10         num_restarts=5,
---> 11         raw_samples=25,
     12 #         fixed_features={d-1:1.}
     13     )

~\Anaconda3\lib\site-packages\botorch\optim\optimize.py in optimize_acqf(acq_function, bounds, q, num_restarts, raw_samples, options, inequality_constraints, equality_constraints, fixed_features, post_processing_func, batch_initial_conditions, return_best_only, sequential, **kwargs)
    155             num_restarts=num_restarts,
    156             raw_samples=raw_samples,
--> 157             options=options,
    158         )
    159 

~\Anaconda3\lib\site-packages\botorch\optim\initializers.py in gen_batch_initial_conditions(acq_function, bounds, q, num_restarts, raw_samples, options)
    111                     end_idx = min(start_idx + batch_limit, X_rnd.shape[0])
    112                     Y_rnd_curr = acq_function(
--> 113                         X_rnd[start_idx:end_idx].to(device=device)
    114                     ).cpu()
    115                     Y_rnd_list.append(Y_rnd_curr)

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~\Anaconda3\lib\site-packages\botorch\acquisition\fixed_feature.py in forward(self, X)
     82         """
     83         X_full = self._construct_X_full(X)
---> 84         return self.acq_func(X_full)
     85 
     86     def _construct_X_full(self, X: Tensor) -> Tensor:

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~\Anaconda3\lib\site-packages\botorch\utils\transforms.py in decorated(cls, X, **kwargs)
    167                 )
    168             X = X if X.dim() > 2 else X.unsqueeze(0)
--> 169             return method(cls, X, **kwargs)
    170 
    171         return decorated

~\Anaconda3\lib\site-packages\botorch\acquisition\knowledge_gradient.py in forward(self, X)
    402                 maximized at fixed `X_actual[b]`.
    403         """
--> 404         X_actual, X_fantasies = _split_fantasy_points(X=X, n_f=self.num_fantasies)
    405 
    406         # We only concatenate X_pending into the X part after splitting

~\Anaconda3\lib\site-packages\botorch\acquisition\knowledge_gradient.py in _split_fantasy_points(X, n_f)
    503     if n_f > X.size(-2):
    504         raise ValueError(
--> 505             f"n_f ({n_f}) must be less than the q-batch dimension of X ({X.size(-2)})"
    506         )
    507     split_sizes = [X.size(-2) - n_f, n_f]

ValueError: n_f (64) must be less than the q-batch dimension of X (1)

Balandat commented 3 years ago

Yeah so the issue is here: https://github.com/pytorch/botorch/blob/0357aa57822ebb3b5c37b6eaceda21b5dd548bb9/botorch/optim/optimize.py#L145-L149

Essentially, by wrapping the MFKG inside the FIxedFeatureAcquisitionFunction, that instance check fails and we don't create initial conditions of the proper size.

I'd like to better understand what you're trying to achieve here. It looks like you're fixing the fidelity feature in your input. Are you essentially trying to find the best configuration while constraining your fidelity to a particular value?

As a workaround, you can circumvent this by passing in appropriate batch_initial_conditions as follows:

from botorch.optim.initializers import gen_one_shot_kg_initial_conditions

batch_initial_conditions = gen_one_shot_kg_initial_conditions(
    acq_function=qKG,
    bounds=bounds,
    q=1,
    num_restarts=5,
    raw_samples=25,
)

# throw away the fidelity dimension
ff_batch_initial_conditions = batch_initial_conditions[..., :-1]

with manual_seed(1234):
    candidates, acq_value = optimize_acqf(
        acq_function=qKG_FF, 
        bounds=ff_bounds,
        q=1,
        num_restarts=5,
        raw_samples=25,
        batch_initial_conditions=ff_batch_initial_conditions,  # make sure to pass this
    )

Balandat commented 3 years ago

cc @saitcakmak

saitcakmak commented 3 years ago

I believe this issue is the same issue here: https://github.com/pytorch/botorch/issues/578#issuecomment-718344297 (and the following comment for workaround, passing q=num_fantasies + actual_q to optimize_acqf, where actual_q is the one you intended to use)

Like @Balandat points out, this is due to wrapping a OneShotAcquisitionFunction within FixedFeatureAcquisitionFunction. I think passing the initial conditions explicitly is the best option here.

saitcakmak commented 3 years ago

@Balandat What would you think of a fix like this:

https://github.com/pytorch/botorch/blob/0357aa57822ebb3b5c37b6eaceda21b5dd548bb9/botorch/optim/optimize.py#L144-L158

Define (with a better name)ffos = isinstance(acq_function, FixedFeatureAcquisitionFunction) and isinstance(acq_function.acq_function, qKnowledgeGradient), Replace if isinstance(acq_function, qKnowledgeGradient) with if isinstance(acq_function, qKnowledgeGradient) or ffos and batch_initial_conditions = ic_gen( acq_function=acq_function, with acq_function=acq_function.acq_function if ffos else acq_function. We can also add a post processing step to remove the feature dimensions from the initial conditions.

saitcakmak commented 3 years ago

I guess a cleaner way to implement this is to define a gen_fixed_feature_initial_conditions that performs these steps

Balandat commented 3 years ago

We can also add a post processing step to remove the feature dimensions from the initial conditions.

One issue with this is that if we generate a low discrepancy Sobol sequence in higher dimensions and then chop off a dimension then that destroys the discrepancy properties.

I guess a cleaner way to implement this is to define a gen_fixed_feature_initial_conditions that performs these steps

Yeah that makes sense. We should probably have a helper that just takes the original bounds and the FixedFeatureAcquisitionFunction and then generates ICs under the hood using the information in the _selector of the FixedFeatureAcquisitionFunction.

saitcakmak commented 3 years ago

if we generate a low discrepancy Sobol sequence in higher dimensions and then chop off a dimension then that destroys the discrepancy properties.

This makes sense when we increase the dimension (e.g., #504), but it is counterintuitive when we reduce the dimension. I remember from the discussion in https://github.com/scipy/scipy/pull/10844 that there are some quirks to using QMC. I took a second look now but can't find anything about chopping off dimensions there. I dug a bit deeper and it seems that chopping off dimensions is safe, see below for discussion and reference.

TLDR: Chopping off dimensions preserves the digital net property. Going with the notation in https://arxiv.org/pdf/2008.08051.pdf, Sobol is a (t, d)-sequence, i.e., first 2^m points of Sobol generates a (t, m, d)-net in base 2. This is saying that every elementary interval of volume 2^{t-m} contains 2^t points. An elementary interval in base 2 is given by E[k, c] = \prod_{j=1}^{d} [ c_j / 2^{k_j}, c_j + 1 / 2^{k_j}), for c_j, k_j integers. If I chop off a dimension (wlog, say last dimension), I know that k_d <= t-m. If k_d = 1, the d-1-dim elementary interval is E[k, c] = \prod_{j=1}^{d-1} [ c_j / 2^{k_j}, c_j + 1 / 2^{k_j}), has volume 2^{t-m} and contains 2^t points. If k_d=2, then the d-1-dim interval contains 2^t points but has volume 2^{t-m+1}. I could look at a 2 x 2 grid of original d-dim intervals to show that the sub-intervals of these d-1-dim intervals contain 2^t points and have volume 2^{t-m}. If this carries on for all k_d <= t-m, this would then tell me that chopping of a dimension of a (t, m, d)-net in base to results in a (t, m, d-1)-net in base 2. Does this then say that chopping off dimensions is safe? I also found that this is given by lemma 4.16 of https://web.maths.unsw.edu.au/~josefdick/preprints/DP_book_preprint.pdf.

Balandat commented 3 years ago

Yeah, that makes sense. I would think that this should easily extend to scrambled sequences as well.

Regardless, I think I would still prefer to generate the initial conditions for the non-fixed dimensions and then add on the fixed dimensions afterwards (as described above), rather than working in the higher-dim space and then fixing ICs.

r-ashwin commented 3 years ago

I'd like to better understand what you're trying to achieve here. It looks like you're fixing the fidelity feature in your input. Are you essentially trying to find the best configuration while constraining your fidelity to a particular value?

@Balandat Yes, that's exactly what I intend to do. Thanks for the workaround, it does solve the error. After doing some tests I observed that using the FixedFeatureAcquisitionFunction to constrain one or more features, using fixed_features within optimize_acqf to do the same thing and using the vanilla optimize_acqf with bounds appropriately modified by setting identical lower and upper bounds for the fixed features all give the same result (unsurprisingly to me), although the last approach seems to be the most efficient.

I believe this issue is the same issue here: #578 (comment)

@saitcakmak Yes it is the same issue, somehow that got lost in the weeds.

Overall, I do have a question about the "one shot" approach: it seems as though the KG acquisition function is continuous but not necessarily smooth (see, e.g., https://people.orie.cornell.edu/pfrazier/pub/CorrelatedKG.pdf). Therefore, when you solve the one-shot acquisition function via gradient-based local optimization methods such as L-BFGS, you can only guarantee a critical point but not necessarily an optimum. Is this a correct assumption or does the one-shot approach make an assumption such as L-smoothness? On a related note, since the number of variables in the one-shot optimization is q + num_fantasies do we need to pick raw_samples to be > q + num_fantasies during optimization? Intuitively, I don't want to pick 100 raw samples when the dimensionality of my optimization problem is ~1000.

Balandat commented 3 years ago

although the last approach seems to be the most efficient.

This is interesting. Using the full function and passing degenerate bounds is more efficient in what sense? Faster to compute? Or less complicated to set up?

Overall, I do have a question about the "one shot" approach: it seems as though the KG acquisition function is continuous but not necessarily smooth (see, e.g., https://people.orie.cornell.edu/pfrazier/pub/CorrelatedKG.pdf)

What part of the paper are you referring to specifically? I guess there are two things to note here:

Once we take the Sample Average Approximation approach, then writing down the posterior mean of the conditioned model you can see under which regularity conditions you get differentiability of this objective (see e.g. proof of Lemma 2 in https://arxiv.org/pdf/1910.06403.pdf). For sufficiently regular kernels you get differentiability and are guaranteed to find a local optimum.
Even if you do have differentiability of the KG objective, we are still faced with a very hard non-convex optimization problem, and so we can't guarantee anything about the solution being global :(. This is really the dirty secret of Bayesian Optimization....

since the number of variables in the one-shot optimization is q + num_fantasies do we need to pick raw_samples to be > q + num_fantasies during optimization? Intuitively, I don't want to pick 100 raw samples when the dimensionality of my optimization problem is ~1000.

Yep that's true. The problem is higher-dimensional so ideally you want to crank up raw_samples as much as you can. Eventually you'll run into memory / compute limitations, so we don't scale this automatically with the number of fantasies. Note that we're being somewhat more smart when generating ICs for qKG by using a heuristic that utilizes the maximizer of the posterior mean that in practice is often close to the optimal value of the fantasy variables: https://github.com/pytorch/botorch/blob/439c9ef13c0462468239768a08055d1178a9b8ab/botorch/optim/initializers.py#L145-L156

r-ashwin commented 3 years ago

This is interesting. Using the full function and passing degenerate bounds is more efficient in what sense? Faster to compute? Or less complicated to set up?

I had meant wall-clock times. The speedup is ~5x

What part of the paper are you referring to specifically?

See Fig.1 -- in this case, the maximum of the KG is in a locally smooth region, so there's no issue but overall you can see that it is not differentiable continuously. I will try to add an example where the max has to be determined by solving a non-smooth optimization problem.

I will read your paper for the regularity assumptions.

Yep that's true. The problem is higher-dimensional so ideally you want to crank up raw_samples as much as you can. Eventually you'll run into memory / compute limitations, so we don't scale this automatically with the number of fantasies.

Yes, right now I am using ~2000 fantasies and I am not able to increase the raw_samples beyond say 500 without facing a memory failure. Are there general rules of thumb to pick num_fantasies for a given problem of dimensions d? I guess it is highly problem dependent.

Balandat commented 3 years ago

I had meant wall-clock times. The speedup is ~5x

Interesting. Do you have some simple code sitting around that compares this? I'd like to profile this to figure out what's going on. It might be that we are generating infeasible ICs for the optimization problem with fixed features right now, which means the optimizer has to work unnecessarily hard (cc @sdaulton)

See Fig.1

From the figure it appears to me that they're considering the noiseless case. In that situation the posterior mean is not differentiable at the observed data points, which looks like what's happening in the figure.

Right now I am using ~2000 fantasies and I am not able to increase the raw_samples beyond say 500 without facing a memory failure. Are there general rules of thumb to pick num_fantasies for a given problem of dimensions d? I guess it is highly problem dependent.

2000 fantasies is a lot. We've gotten decent results with much fewer, say 16-64. As you crank up the number of fantasies you're going to increase the size of your optimization problem a lot, which makes it way harder to solve. I would try to start from a small number of fantasies (in which case you can crank up raw_samples), and then see what the marginal benefits of increasing that number is.

r-ashwin commented 3 years ago

Do you have some simple code sitting around that compares this?

Sure - see below. Actually, the speed-up changes when I set num_fantasies to be small like 64. Also, the reported times are highly-variable - suggesting that I might have a lot going on on my workstation while I timed this code. However, I see that the results change drastically with the num_fantasies which I varied from 16-64 and then tried with 1000. Can you point me to an example where you got good results with 16-64? Also, unless I set a common seed, the answer between successive repetitions of the optimize_acqf is different. I would expect this with a highly nonconvex objective function, however I am wondering if there are flat regions where the optimizer is getting stuck. I did print the gradf at the end of optimization in gen_candidates_scipy to check that the gradient is close to zero at the conclusion of the optimization though.

I have tried plotting the acquisition function via evaluate() and compared its maximizer via the one-shot maximizer and have found different results. I will post an example of that sometime next week.

import math
import torch
from matplotlib import cm
from matplotlib import pyplot as plt
import numpy as np

import botorch
from botorch.fit import fit_gpytorch_model
from botorch.models import SingleTaskGP
from botorch.utils import standardize
from gpytorch.mlls import ExactMarginalLogLikelihood

import gpytorch                                      # main GP library
from botorch.models.gpytorch import GPyTorchModel

from time import time

def griewank_t(x):
    t1 = time()
    n, p_ = x.shape
    p = p_ - 1
    if p_ != 3:
        raise ValueError('error! inputs must be [...x 3]. ')

    y = np.zeros(x.shape[0])
    for i in range(x.shape[0]):
        theta = torch.tensor(x[i][-1] * math.pi / 4)
        xr    = rotate_x(theta, x[i][:-1].reshape(-1,2)).numpy()
        y[i]  = np.sum(np.power(xr, 2) / 4000.0, axis=1) - np.prod(np.divide(np.cos(xr), np.sqrt(np.arange(p)+1)), axis=1) + 1
        y[i]  = y[i] * np.exp(-(xr[0,0] - 3)**2 /8/(ub-lb) - xr[0,1]**2 /8/(ub-lb))
    print(f"elapsed time for oracle eval {time() - t1}")
    return 1 * torch.tensor(y)

def rotate_x(theta, test_x):
    R = torch.tensor([[torch.cos(theta), -torch.sin(theta)],
                  [torch.sin(theta), torch.cos(theta)]])
    return torch.matmul(test_x, R)

import numpy as np
from botorch.utils.transforms import normalize, standardize, unnormalize
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm

training_size = 60

p  = 2
lb =-5.0
ub = 5.0
tn = 1.
T  = 4.0

#taining data
t       = torch.linspace(0, tn, training_size).reshape(-1,1)
train_x = lb + (ub-lb) * torch.rand([training_size, p])
train_x = torch.cat((train_x, t), dim=-1)
train_y = griewank_t(train_x )

# train_y = torch.tensor(train_y, dtype=torch.float32)

# normalize and standardize
actual_bounds = torch.stack([torch.cat([torch.ones(p)*lb, torch.tensor([0.])]),
                            torch.cat([torch.ones(p)*ub, torch.tensor([T])])])
train_x_n = normalize(train_x, bounds=actual_bounds)
train_y_n = train_y#standardize(train_y)

gp   = SingleTaskGP(train_x_n, train_y_n.reshape(-1,1))
mll  = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll);

from botorch.acquisition import qKnowledgeGradient, qMultiFidelityKnowledgeGradient
from botorch.acquisition import UpperConfidenceBound, PosteriorMean, ExpectedImprovement

# project fidelity to T
T = 1.0
def project(x):
    return torch.cat([x[..., :-1], torch.full_like(x[..., -1:], T)], dim=-1)

# MFKG
qMFKG = qMultiFidelityKnowledgeGradient(gp, num_fantasies=32, project=project)
# UCB
ucb = UpperConfidenceBound(gp, beta=2.)

# Posterior mean
pm = PosteriorMean(gp)

from botorch.acquisition import FixedFeatureAcquisitionFunction
T = 1.0
columns = [p]
values  = [train_x_n[-1, -1]]
qKG_FF  = FixedFeatureAcquisitionFunction(qMFKG, p+1, columns, values)

from botorch.optim.initializers import gen_one_shot_kg_initial_conditions
from botorch.optim import optimize_acqf
from botorch.utils.sampling import manual_seed
from time import time

bounds    = torch.cat( [torch.zeros(p+1).reshape(1,-1), torch.ones(p+1).reshape(1,-1)], dim=0)
ff_bounds = bounds[...,:-1].reshape(2,-1)
t_bounds  = bounds
t_bounds[...,-1] = train_x_n[-1, -1]

with manual_seed(1235):
    batch_initial_conditions = gen_one_shot_kg_initial_conditions(
        acq_function=qMFKG,
        bounds=t_bounds,
        q=1,
        num_restarts=20,
        raw_samples=250,
    )

ff_batch_initial_conditions = batch_initial_conditions[..., :-1]

t1 = time()
with manual_seed(1235):
    candidates, acq_value = optimize_acqf(
        acq_function=qKG_FF, 
        bounds=ff_bounds,
        q=1,
        num_restarts=20,
        raw_samples=250,
        batch_initial_conditions=ff_batch_initial_conditions, 
    )
print(f'elapsed time {time()-t1}')
print(f'candidate {candidates}')

t1 = time()

with manual_seed(1235):
    candidates, acq_value = optimize_acqf(
            acq_function=qMFKG, 
            bounds=t_bounds,
            q=1,
            num_restarts=20,
            raw_samples=250,
            fixed_features={p:train_x_n[-1, -1].item()}
        )

print(f'elapsed time {time()-t1}')
print(f'candidate {candidates}')

t1 = time()

with manual_seed(1235):
    candidates, acq_value = optimize_acqf(
        acq_function=qMFKG, 
        bounds=t_bounds,
        q=1,
        num_restarts=20,
        raw_samples=250,
#         fixed_features={p:train_x_n[-1, -1].item()}
    )
print(f'elapsed time {time()-t1}')
print(f'candidate {candidates}')

Balandat commented 3 years ago

Can you point me to an example where you got good results with 16-64?

If I recall correctly, most of the lower-dim synthetic test functions did work quite well with 32-64 fantasy samples.

the reported times are highly-variable - suggesting that I might have a lot going on on my workstation while I timed this code

Not necessarily. Since we're randomizing the initial conditions, we may get lucky or not with our starting points for LBFGS-B, which can have a significant effect on the optimization result and necessary number of iterations. High variance in the optimization time is not surprising.

unless I set a common seed, the answer between successive repetitions of the optimize_acqf is different.

I ran your example, the results appear pretty consistent (small numerical differences): repro_botorch_issue_626.ipynb.txt

I am wondering if there are flat regions where the optimizer is getting stuck.

This can indeed happen. Depending on the model, acquisition functions such as KG or EI are notorious for being flat (zero, actually) across much of the search space in some situations. We do our best to exploit fast batch evaluations to splatter a large number of initial conditions on the search space and run a Boltzmann-sampling heuristic on those in order to minimize the risk of getting stuck in the flat parts, but at the end of the day there is a lot of room in R^n for large n...

saitcakmak commented 3 years ago

Can you point me to an example where you got good results with 16-64?

You could also check out this google sheet (https://docs.google.com/spreadsheets/d/1DWQxD-qr-7SQVLT64da3qs0FdsiF1UblFALAMIqIRmI/edit?usp=sharing) which reports experiment results for the one-shot and some other approaches for optimizing KG. There are a number of test functions studied there, where each tab is for a given test function and a given sampling budget. If a function name has a number attached to it, that is the function dimension, otherwise it is the default dimension given in BoTorch. For your question, the lines with OSKG and OSKG num_fant-16 would be of interest, which report one-shot KG performance using 64 and 16 fantasies respectively. Some tabs also have nested-KG reported, which is just KG optimized in a nested manner (64 fantasies). The takeaway is that for functions with <10 dims, 16 vs 64 fantasies does not make too much difference, and one may beat the other based on the particular problem structure or the computational budget assigned to it.

FWIW, in (https://arxiv.org/abs/2007.05554) we used 10 fantasies across the board (mainly for computational reasons) on 4-7 dim functions with good success.

r-ashwin commented 3 years ago

@Balandat @saitcakmak

I tested KG in v 0.3.3 and I can confirm that the one-shot maximum and the KG.evaluate() maximum are consistent, provided there are enough restarts. See below for an example of how the restarts matter, with num_fantasies = 32. The notebook is also attached. I guess the most important realization for me was that the num_fantasies does not need to be as high as I thought it should -- I was thinking along the lines of the number of samples required for a MC approximation.

Overall, I do see that for observation noise ~<1e-3 the acquisition function does look non-smooth. I should use caution here because I am approximating the acquisition function on a rather coarse grid. However, I checked that the stopping criteria for L-BFGS in SciPy is not necessarily the first order optimality but also the tolerance on function values. I separately checked optimizing other non-smooth functions to verify this.

Thanks for all your response - much appreciated!

KG_demo-1d.txt

Balandat commented 3 years ago

Great. I assume this can be considered resolved then, feel free to reopen if not.

pytorch / botorch

Error when using FixedFeatureAcquisitionFunction and project with qMultifidelityKnowledgeGradient #626