pytorch / botorch

Bayesian optimization in PyTorch
https://botorch.org/
MIT License
3.06k stars 390 forks source link

[Bug] optimize_acqf erroring out with SingleTaskMultiFidelityGP #2402

Open esantorella opened 2 months ago

esantorella commented 2 months ago

Thanks to ToennisStef for raising this in #2393.

🐛 Bug

I'm looking at an example with a SingleTaskMultiFidelityGP, evaluating acquisition values where both the x and the objective are at fidelities other than the highest fidelity. This produces NaN acquisition values and causes optimize_acqf to error out. While optimizing for a fidelity other than the highest may not make sense, this also happens when optimizing qMultiFidelityKnowledgeGradient for the highest fidelity. I'm seeing the following behavior:

What the posterior looks like:

image

acqf values if we were to just work with fidelity=0:

image

To reproduce

See gist for full code. It ends with

candidates, _ = optimize_acqf_mixed(
    acq_function=mfkg_acqf,
    bounds=bounds_x,
    fixed_features_list=[{1: 0}],
    q=1,
    num_restarts=5,
    raw_samples=128,
    # batch_initial_conditions=X_init,
    options={"batch_limit": 5, "maxiter": 200},
)

Alternatively, skipping the cost function setup, the same error can be produced more simply with

acq_func = FixedFeatureAcquisitionFunction(
    acq_function=qLogExpectedImprovement(model=model, best_f=train_y[train_x[:,1]==3].max()), 
    d=1+1, 
    columns=[1],
    values=[0],
)

candidates, _ = optimize_acqf(
    acq_function=acq_func,
    bounds=torch.tensor([[0.], [1.]], dtype=torch.float64),
    q=1,
    num_restarts=20,
    raw_samples=512,
)

Stack trace/error message

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[12], line 1
----> 1 candidates, _ = optimize_acqf_mixed(
      2     acq_function=mfkg_acqf,
      3     bounds=bounds_x,
      4     # fixed_features_list=[{1: i} for i in range(3)],
      5     fixed_features_list=[{1: 0}],
      6     q=1,
      7     num_restarts=5,
      8     raw_samples=128,
      9     # batch_initial_conditions=X_init,
     10     options={"batch_limit": 5, "maxiter": 200},
     11 )

File ~/botorch/botorch/optim/optimize.py:926, in optimize_acqf_mixed(acq_function, bounds, q, num_restarts, fixed_features_list, raw_samples, options, inequality_constraints, equality_constraints, nonlinear_inequality_constraints, post_processing_func, batch_initial_conditions, ic_generator, ic_gen_kwargs)
    924 ff_candidate_list, ff_acq_value_list = [], []
    925 for fixed_features in fixed_features_list:
--> 926     candidate, acq_value = optimize_acqf(
    927         acq_function=acq_function,
    928         bounds=bounds,
    929         q=q,
    930         num_restarts=num_restarts,
    931         raw_samples=raw_samples,
    932         options=options or {},
    933         inequality_constraints=inequality_constraints,
    934         equality_constraints=equality_constraints,
    935         nonlinear_inequality_constraints=nonlinear_inequality_constraints,
    936         fixed_features=fixed_features,
    937         post_processing_func=post_processing_func,
    938         batch_initial_conditions=batch_initial_conditions,
    939         ic_generator=ic_generator,
    940         return_best_only=True,
    941         **ic_gen_kwargs,
    942     )
    943     ff_candidate_list.append(candidate)
    944     ff_acq_value_list.append(acq_value)

File ~/botorch/botorch/optim/optimize.py:543, in optimize_acqf(acq_function, bounds, q, num_restarts, raw_samples, options, inequality_constraints, equality_constraints, nonlinear_inequality_constraints, fixed_features, post_processing_func, batch_initial_conditions, return_best_only, gen_candidates, sequential, ic_generator, timeout_sec, return_full_tree, retry_on_optimization_warning, **ic_gen_kwargs)
    520     gen_candidates = gen_candidates_scipy
    521 opt_acqf_inputs = OptimizeAcqfInputs(
    522     acq_function=acq_function,
    523     bounds=bounds,
   (...)
    541     ic_gen_kwargs=ic_gen_kwargs,
    542 )
--> 543 return _optimize_acqf(opt_acqf_inputs)

File ~/botorch/botorch/optim/optimize.py:564, in _optimize_acqf(opt_inputs)
    561     return _optimize_acqf_sequential_q(opt_inputs=opt_inputs)
    563 # Batch optimization (including the case q=1)
--> 564 return _optimize_acqf_batch(opt_inputs=opt_inputs)

File ~/botorch/botorch/optim/optimize.py:255, in _optimize_acqf_batch(opt_inputs)
    252     batch_initial_conditions = opt_inputs.batch_initial_conditions
    253 else:
    254     # pyre-ignore[28]: Unexpected keyword argument `acq_function` to anonymous call.
--> 255     batch_initial_conditions = opt_inputs.get_ic_generator()(
    256         acq_function=opt_inputs.acq_function,
    257         bounds=opt_inputs.bounds,
    258         q=opt_inputs.q,
    259         num_restarts=opt_inputs.num_restarts,
    260         raw_samples=opt_inputs.raw_samples,
    261         fixed_features=opt_inputs.fixed_features,
    262         options=options,
    263         inequality_constraints=opt_inputs.inequality_constraints,
    264         equality_constraints=opt_inputs.equality_constraints,
    265         **opt_inputs.ic_gen_kwargs,
    266     )
    268 batch_limit: int = options.get(
    269     "batch_limit",
    270     (
   (...)
    274     ),
    275 )
    277 def _optimize_batch_candidates() -> Tuple[Tensor, Tensor, List[Warning]]:

File ~/botorch/botorch/optim/initializers.py:515, in gen_one_shot_kg_initial_conditions(acq_function, bounds, q, num_restarts, raw_samples, fixed_features, options, inequality_constraints, equality_constraints)
    512 q_aug = acq_function.get_augmented_q_batch_size(q=q)
    514 # TODO: Avoid unnecessary computation by not generating all candidates
--> 515 ics = gen_batch_initial_conditions(
    516     acq_function=acq_function,
    517     bounds=bounds,
    518     q=q_aug,
    519     num_restarts=num_restarts,
    520     raw_samples=raw_samples,
    521     fixed_features=fixed_features,
    522     options=options,
    523     inequality_constraints=inequality_constraints,
    524     equality_constraints=equality_constraints,
    525 )
    527 # compute maximizer of the value function
    528 value_function = _get_value_function(
    529     model=acq_function.model,
    530     objective=acq_function.objective,
   (...)
    533     project=getattr(acq_function, "project", None),
    534 )

File ~/botorch/botorch/optim/initializers.py:424, in gen_batch_initial_conditions(acq_function, bounds, q, num_restarts, raw_samples, fixed_features, options, inequality_constraints, equality_constraints, generator, fixed_X_fantasies)
    422         start_idx += batch_limit
    423     Y_rnd = torch.cat(Y_rnd_list)
--> 424 batch_initial_conditions = init_func(
    425     X=X_rnd, Y=Y_rnd, n=num_restarts, **init_kwargs
    426 ).to(device=device)
    427 if not any(issubclass(w.category, BadInitialCandidatesWarning) for w in ws):
    428     return batch_initial_conditions

File ~/botorch/botorch/optim/initializers.py:952, in initialize_q_batch(X, Y, n, eta)
    950     weights = torch.exp(etaZ)
    951 if batch_shape == torch.Size():
--> 952     idcs = torch.multinomial(weights, n)
    953 else:
    954     idcs = batched_multinomial(
    955         weights=weights.permute(*range(1, len(batch_shape) + 1), 0), num_samples=n
    956     ).permute(-1, *range(len(batch_shape)))

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Expected Behavior

Numerical inaccuracy is not uncommon in optimization; however, this typically should not lead to exceptions, since multi-restart optimization may allow for finding an optimum nonetheless. In this case, it is clear there is an optimum, so optimize_acqf should find it.

System information

Please complete the following information:

Balandat commented 2 months ago

cc @SebastianAment re qLogEI having a "hole". The model actually seems fine here (thanks @esantorella for the great diagnostics), so this is probably just b/c the incumbent is so high (8.8638 in this case if I got that right from the other issue, by far the largest observed value).

As a first step I would recommend using qLogNoisyExpectedImprovement here, which usually has better numerical behavior.