Closed dfalster closed 3 months ago
hi @dfalster thanks for your interest and reaching out!
You can access the LearnerRegr
within a Surrogate
via the $learner
field, for example:
library(bbotk)
library(mlr3mbo)
library(mlr3learners)
set.seed(1)
obfun = ObjectiveRFun$new(
fun = function(xs) list(y1 = xs$x ^ 2),
domain = ps(x = p_dbl(lower = -10, upper = 10)),
codomain = ps(y1 = p_dbl(tags = "minimize")))
instance = OptimInstanceSingleCrit$new(
objective = obfun,
terminator = trm("evals", n_evals = 10))
surrogate = srlrn(lrn("regr.km", control = list(trace = FALSE)))
acqfun = acqf("ei")
acqopt = acqo(opt("random_search", batch_size = 100),
terminator = trm("evals", n_evals = 100))
optimizer = opt("mbo",
loop_function = bayesopt_ego,
surrogate = surrogate,
acq_function = acqfun,
acq_optimizer = acqopt)
optimizer$optimize(instance)
surrogate
surrogate$learner
<SurrogateLearner>: LearnerRegrKM
* Parameters: assert_insample_perf=FALSE, catch_errors=TRUE
<LearnerRegrKM:regr.km>
* Model: km
* Parameters: control=<list>
* Packages: mlr3, mlr3learners, DiceKriging
* Predict Types: response, [se]
* Feature Types: logical, integer, numeric
* Properties: -
If you already used the surrogate to do BO (as in the example above, the learner will be in the state of the latest surrogate fitting step)
surrogate$learner$model
Call:
DiceKriging::km(design = data, response = task$truth(), control = pv$control)
Trend coeff.:
Estimate
(Intercept) 256.1655
Covar. type : matern5_2
Covar. coeff.:
Estimate
theta(x) 30.0000
Variance estimate: 35104.96
Note that you can always also access the surrogate from the optimizer via the $surrogate
field, e.g., we can also do:
optimizer$surrogate$learner$model
Now you can in principle either use the $learner
within the surrogate or the surrogate class directly to make new predictions. For example:
library(data.table)
library(ggplot2)
xdt = data.table(x = seq(-10, 10, length.out = 101))
y1 = obfun$eval_dt(xdt)
surrogate_pred = surrogate$predict(xdt)
xdt[, "y1" := y1]
xdt[, c("mean_prediction", "se_prediction") := surrogate_pred]
# plot true function in black
# surrogate prediction (mean +- se in grey)
# known optimum in darkred
# found optimum in darkgreen
ggplot(aes(x = x, y = y1), data = xdt) +
geom_line() +
geom_line(aes(x = x, y = mean_prediction), colour = "grey", data = xdt) +
geom_ribbon(aes(x = x, ymin = mean_prediction - se_prediction, ymax = mean_prediction + se_prediction), fill = "grey", alpha = 0.2) +
geom_point(aes(x = 0, y = 0), color = "darkred") +
geom_point(aes(x = x, y = y1), data = instance$result, color = "darkgreen") +
theme_minimal()
I hope this answered your questions!
(also: our book now contains a larger section about BO https://mlr3book.mlr-org.com/chapters/chapter5/advanced_tuning_methods_and_black_box_optimization.html#sec-bayesian-optimization which may be more detailed then the current vignettes - but I will update them soon)
hi @sumny . Thanks so much for this through and helpful explanation. Makes sense. I had looked at the surrogate but hadn't realised it contained the updated GP. Makes sense.
I'd suggest updating your example to illustrate use of the surrogate after the optimisation is complete.
As a followup, the example focusses on optimisation. What if you just wanted to build a surrogate, e.g. with uncertainty below some threshold. This could then be used for optimisation, or other purses (e.g. visualisation of output response for a black box model. Would I do something different to optimisation to build this surrogate?
I'd suggest updating your example to illustrate use of the surrogate after the optimisation is complete.
Will do!
As a followup, the example focusses on optimisation. What if you just wanted to build a surrogate, e.g. with uncertainty below some threshold. This could then be used for optimisation, or other purses (e.g. visualisation of output response for a black box model. Would I do something different to optimisation to build this surrogate?
Assuming you just want a "good" surrogate model and build it iteratively, one way to do this would be to maybe change the acquisition function in your BO loop that focuses on building a "good" model instead of finding high promising candidate points for evaluation. If we agree that a "good" surrogate model is a model that has low uncertainty everywhere (or in certain specified regions) then our acquisition function (proposing the next candidate point for evaluation) could simply be the posterior predictive variance of the surrogate model - which we then want to maximize.
For example lets assume we have the following true black box function:
library(bbotk)
library(mlr3mbo)
library(mlr3learners)
library(data.table)
library(ggplot2)
set.seed(1)
sinus_1D = function(xs) 2 * xs$x * sin(14 * xs$x)
domain = ps(x = p_dbl(lower = 0, upper = 1))
codomain = ps(y = p_dbl(tags = "minimize"))
objective = ObjectiveRFun$new(sinus_1D,
domain = domain, codomain = codomain)
xydt = generate_design_grid(domain, resolution = 1001)$data
xydt[, "y" := objective$eval_dt(xydt)$y]
ggplot(aes(x = x, y = y), data = xydt) +
geom_line() +
theme_minimal()
Now assume we have evaluated some initial design on the optim instance:
instance = OptimInstanceSingleCrit$new(objective, search_space = domain, terminator = trm("evals", n_evals = 20))
design = data.table(x = c(0.10, 0.31, 0.5, 0.59))
instance$eval_batch(design)
Then a GP fitted on this initial design will look like the following:
lrn_gp = lrn("regr.km", covtype = "matern5_2", optim.method = "BFGS", control = list(trace = FALSE))
surrogate = srlrn(lrn_gp, archive = instance$archive)
surrogate$update()
xydt[, c("mean", "se") := surrogate$predict(xydt[, "x"])]
ggplot(xydt, mapping = aes(x = x, y = y)) +
geom_point(size = 2, data = instance$archive$data) +
geom_line() +
geom_line(aes(y = mean), colour = "gray", linetype = 2) +
geom_ribbon(aes(min = mean - se, max = mean + se), fill = "gray", alpha = 0.3) +
geom_line(aes(y = se), linewidth = 1, colour = "darkgray") +
scale_y_continuous("y", sec.axis = sec_axis(~ ., name = "Posterior Standard Deviation")) +
theme_minimal()
(Black line true function, surrogate mean prediction in dashed lightgrey and ribbons represent uncertainty around the mean prediction whereas the darkgrey line is simply the posterior predictive standard deviation).
A sensible strategy for choosing a next candidate point for evaluation could then simply be to find the point with the highest posterior predictive variance (i.e., highest surrogate model uncertainty) - because the model will have less uncertainty in that region after we will have evaluated that point.
In the Figure this would be around x = 1
(where the darkgrey line has a value of around 0.631
.)
We would then evaluate this point and update the surrogate model.
instance$eval_batch(data.table(x = 1.0))
surrogate$update()
xydt[, c("mean", "se") := surrogate$predict(xydt[, "x"])]
ggplot(xydt, mapping = aes(x = x, y = y)) +
geom_point(size = 2, data = instance$archive$data) +
geom_line() +
geom_line(aes(y = mean), colour = "gray", linetype = 2) +
geom_ribbon(aes(min = mean - se, max = mean + se), fill = "gray", alpha = 0.3) +
geom_line(aes(y = se), linewidth = 1, colour = "darkgray") +
scale_y_continuous("y", sec.axis = sec_axis(~ ., name = "Posterior Standard Deviation")) +
theme_minimal()
The next candidate based on the highest uncertainty would now probably be around x = 0.81
.
So in essence we could just do "normal" BO but with a different acquisition function, because we mostly care about reducing the uncertainty of the model. To use the uncertainty of the surrogate model as acquisition function, we can use:
acq_function = acqf("sd")
Now we could run the "optimization" as we normally would (on a side note: probably we should use a more sophisticated acquisition function optimizer but for illustrative purposes we just skip it and use the cheap default):
optimizer = opt("mbo", loop_function = bayesopt_ego, surrogate = surrogate, acq_function = acq_function)
optimizer$optimize(instance)
The final archive then looks like:
x y x_domain timestamp batch_nr acq_sd .already_evaluated
1: 0.1000000000 1.970899e-01 <list[1]> 2023-11-01 12:02:27 1 NA NA
2: 0.3100000000 -5.775057e-01 <list[1]> 2023-11-01 12:02:27 1 NA NA
3: 0.5000000000 6.569866e-01 <list[1]> 2023-11-01 12:02:27 1 NA NA
4: 0.5900000000 1.084067e+00 <list[1]> 2023-11-01 12:02:27 1 NA NA
5: 1.0000000000 1.981215e+00 <list[1]> 2023-11-01 12:12:40 2 NA NA
6: 0.8105923343 -1.521380e+00 <list[1]> 2023-11-01 12:23:43 3 0.52512337 FALSE
7: 0.0002707751 2.052931e-06 <list[1]> 2023-11-01 12:23:45 4 1.11439774 FALSE
8: 0.7006544482 -5.254747e-01 <list[1]> 2023-11-01 12:23:46 5 1.02513441 FALSE
9: 0.2058567775 1.056835e-01 <list[1]> 2023-11-01 12:23:48 6 0.89366871 FALSE
10: 0.6837159549 -2.006213e-01 <list[1]> 2023-11-01 12:23:50 7 0.95964693 FALSE
11: 0.9104040253 3.246989e-01 <list[1]> 2023-11-01 12:23:52 8 0.49034916 FALSE
12: 0.4037153262 -4.764570e-01 <list[1]> 2023-11-01 12:23:54 9 0.38552672 FALSE
13: 0.0470159336 5.752043e-02 <list[1]> 2023-11-01 12:23:55 10 0.13293200 FALSE
14: 0.2578149722 -2.325176e-01 <list[1]> 2023-11-01 12:23:57 11 0.10138293 FALSE
15: 0.9586878328 1.447122e+00 <list[1]> 2023-11-01 12:23:59 12 0.08685817 FALSE
16: 0.4519821387 4.027150e-02 <list[1]> 2023-11-01 12:24:01 13 0.07853812 FALSE
17: 0.7623390332 -1.445916e+00 <list[1]> 2023-11-01 12:24:03 14 0.06997645 FALSE
18: 0.1527324831 2.575894e-01 <list[1]> 2023-11-01 12:24:04 15 0.06204140 FALSE
19: 0.8606257185 -8.516840e-01 <list[1]> 2023-11-01 12:24:06 16 0.04969792 FALSE
20: 0.5472359485 1.074219e+00 <list[1]> 2023-11-01 12:24:08 17 0.04314195 FALSE
The final surrogate then looks like:
surrogate$update()
xydt[, c("mean", "se") := surrogate$predict(xydt[, "x"])]
ggplot(xydt, mapping = aes(x = x, y = y)) +
geom_point(size = 2, data = instance$archive$data) +
geom_line() +
geom_line(aes(y = mean), colour = "gray", linetype = 2) +
geom_ribbon(aes(min = mean - se, max = mean + se), fill = "gray", alpha = 0.3) +
theme_minimal()
Which is probably quite decent with respect to overall uncertainty.
However, note that we just ran the "optimization" for 20 iterations based on the terminator we specified when constructing the instance (trm("evals", n_evals = 20)
).
You mentioned:
with uncertainty below some threshold
It probably would make sense to use a custom Terminator
here but I am afraid I cannot directly provide one in mlr3mbo
at the moment.
As a draft, you could look at the following:
library(R6)
library(bbotk)
library(checkmate)
library(mlr3misc)
TerminatorUncertaintyLevelReached= R6Class("TerminatorUncertaintyLevelReached",
inherit = Terminator,
public = list(
#' @description
#' Creates a new instance of this [R6][R6::R6Class] class.
initialize = function() {
param_set = ps(
level = p_dbl(tags = "required", default = 0.1)
)
param_set$values = list(level = 0.1)
super$initialize(
id = "uncertainty_level_reached",
param_set = param_set,
properties = "single-crit",
label = "Uncertainty Level",
man = "mlr_terminators_uncertainty_level_reached"
)
},
#' @description
#' Is `TRUE` iff the termination criterion is positive, and `FALSE`
#' otherwise.
#'
#' @return `logical(1)`.
is_terminated = function(archive) {
assert_r6(archive, "Archive")
level = self$param_set$values$level
# quite hacky
if (archive$n_evals == 0L | "acq_sd" %nin% colnames(archive$data)) {
return(FALSE)
}
sddata = archive$data[["acq_sd"]]
any(sddata <= level, na.rm = TRUE)
}
)
)
This terminator could then be used within the instance:
terminator = TerminatorUncertaintyLevelReached$new()
terminator$param_set$values$level = 0.01 # specify threshold
instance = OptimInstanceSingleCrit$new(objective, search_space = domain, terminator = terminator)
optimizer$optimize(instance)
and now, optimization continues as long as acq_sd
is above the given 0.01
level.
hi @sumny , Thanks you so much for this. I really appreciate the time you spent on this post, thanks for your generosity.
BTW - Your example could also easily be added as a chapter in the book :).
Your suggestion to change the acquisition function and continue using bayesopt is perfect. I tried this, and as expected, the algorithm spends more time resolving across the domain, instead of the maximum, as shown in the plots below.
In both cases I seeded with an initial design or 11 points, then let the bayesopt algorithm choose the next 9 (so terminating with 20 evaluations). I love that a single line changes the behaviour.
optimizer <- opt("mbo",
loop_function = bayesopt_ego,
surrogate = srlrn(lrn("regr.km", covtype = "matern3_2", optim.method = "BFGS",control = list(trace = FALSE))),
acq_function = acqf("sd"),
acq_optimizer = acqo(opt("nloptr", algorithm = "NLOPT_GN_ORIG_DIRECT"),
terminator = trm("stagnation", iters = 100, threshold = 1e-5))
)
obfun <- ObjectiveRFun$new(
fun = function(xs) list(y = f(xs$x)),
domain = ps(x = p_dbl(lower = log(0.01), upper = log(2))),
codomain = ps(y = p_dbl(tags = "maximize"))
)
initial_design <- data.table(x = sort(c(log(p$strategies[[1]]$lma), seq_range(log(c(0.01, 2)), 10))))
instance <- OptimInstanceSingleCrit$new(
objective = obfun,
terminator = trm("evals", n_evals = 20)
)
instance$eval_batch(initial_design)
optimizer$optimize(instance)
# make predictions
xdt <- data.table(x = seq_range(log(c(0.01, 2)), length.out = 101))
surrogate_pred <- optimizer$surrogate$predict(xdt)
xdt[, c("y", "se") := surrogate_pred]
# plot true function in red, surrogate prediction (mean +- se in black/ grey)
instance$archive$data %>%
as_tibble() %>%
ggplot(aes(x = x, y = y)) +
geom_point() +
geom_line(data = tibble(x = log(x), y), col = "red") +
geom_line(data = xdt, col = "black") +
geom_ribbon(data = xdt, aes(x = x, ymin = y - se, ymax = y + se), fill = "grey", alpha = 0.2) +
theme_minimal()
acqf("ei")
- focusses on maximum(red is true function, black is surrogate)
acqf("sd")
- focusses on reducing sd everywhereSuccess!!
However, you'll notice an oddity in the second plot. I'll explain it here as it might be a small bug.
Strangely, theres a point near the top left of the broad ridge of the second plot, where the gp has diverged a long way from the point. I've seen this a few times now. I thought that was odd behaviour, given it's mean to be an interpolator. I've been trying to figure out what's going on and think I've reached a conclusion.
to verify, I tried refitting the learner to the exact same data and interestingly it no longer diverges.
The code and plot below follows on from the code above, so we use the same data collected in the optimisation and refit a new learner of exact same type, then plot. Black is the original learner showing the divergence. Green is the new learner., now going through the point (in blue), as expected.
new_surrogate <- srlrn(lrn("regr.km", covtype = "matern3_2", optim.method = "BFGS", control = list(trace = FALSE)), archive = instance$archive)
new_surrogate$update()
new_surrogate$learner$model
new_surrogate_pred <- new_surrogate$predict(xdt)
xdt[, c("y2", "se") := new_surrogate_pred]
instance$archive$data %>%
as_tibble() %>%
ggplot(aes(x = x, y = y)) +
geom_point() +
geom_line(data = tibble(x = log(x), y), col = "red") +
geom_line(data = xdt, col = "black") +
geom_line(data = xdt, aes(y=y2), col = "green") +
geom_point(data = instance$archive$data %>% as_tibble() %>% filter(batch_nr==10), col="blue") +
theme_minimal()
It turns out, the blue point was the last one added (i.e. highest batch_nr
). So after the optimisation, is it possible the learner hasn't been updated to reflect the last point added?
To check this I added an extra call to optimizer$surrogate$update()
after the optimisation step, and voila, problem gone.
I do think it's dangerous to rely on the user to run the last update step, so suggest this call should be within the optimisation loop, prior to completion. Otherwise, the learner is out of sync with the data.
@dfalster thanks for the follow up!
You are correct - the surrogate used to only be updated prior to a new candidate point being searched for and therefore after the optimization process has terminated, the surrogate is no longer "up to date" with all data logged in the archive of the instance.
I now changed this behavior, see #132 to try a final update step of the surrogate after the optimization process has terminated (although this is not always meaningful, e.g., when using ParEGO for multi-objective optimization). In general, the surrogate state should now be more consistent with the expectation of the user - thanks for bringing this up.
Super!!!
Hi team, Thanks for your great work on the collection of packages. I am a research in Australia using GPs to assist with deterministic models of forest dynamics. I am interested in fitting GPs to fix maxima, using Bayesian optimisation.
I have a question about Bayesian Optimisation. Is it possible to access the fitted surrogate after optimisation step?
For example, in your simple-optimization-example on the readme, once we've finished the optimisation procedure, can we used the surrogate developed in the background? I have only been able to find the predicted optimum, and the archive of points where the objective function was evaluated.
For example, how would I compare a plot of the fitted GP to original function, with the optimum shown?
Similar, in the example of Single-Objective: 2D Schwefel Function. there are some plots in the example, but as far as I can tell, these use the original objective function not the fitted surrogate.
Many thanks for considering this!
Daniel Falster