Closed hududed closed 9 months ago
Hi @hududed and thanks for your issue!
It seems like you want to do human in the loop MO BO.
If you have two objectives (y1
and y2
) and want to optimize them jointly via BO, you first must decide which MO strategy you want to employ.
For example, ParEGO (as in bayesopt_parego
) would scalarize the objectives at each iteration and use a surrogate to model the scalarized objective and proceed to use a standard acquisition function such as Expected Improvement (AcqFunctionEI
) to find the next candidate.
Other approaches, e.g., based on Expected Hypervolume Improvement, would use a surrogate that models both objectives and then use an "aggregating" acquisition function (e.g., expected Hypervolume Improvement AcqFunctionEHVI
) to find the next candidate.
If you want to use ParEGO, your human in the loop code should essentially mimic what is done inside bayesopt_parego
.
If you want to use for example Expected Hypervolume Improvement something like the following should work:
data = data.table(
x1 = c(3000, 4000, 5000, 4500),
x2 = c(2000, 3000, 4000, 3500),
x3 = c(1, 2, 3, 2),
y1 = c(2.0, 3.0, 4.0, 3.5),
y2 = c(1.0, 2.0, 3.0, 2.5))
domain = ps(x1 = p_int(lower = 2000, upper = 5500),
x2 = p_int(lower = 1000, upper = 20000),
x3 = p_int(lower = 1, upper = 10))
codomain = ps(y1 = p_int(tags = "minimize"), y2 = p_dbl(tags = "minimize"))
archive = Archive$new(search_space = domain, codomain = codomain)
archive$add_evals(xdt = data[, c("x1", "x2", "x3")], ydt = data[, c("y1","y2")])
surrogate = srlrn(list(default_gp(), default_gp()), archive = archive)
acq_function = acqf("ehvi", surrogate = surrogate)
acq_optimizer = acqo(
opt("focus_search", n_points = 1000, maxit = 10),
terminator = trm("evals", n_evals = 11000),
acq_function = acq_function)
set.seed(42)
acq_function$surrogate$update()
acq_function$update()
candidate = acq_optimizer$optimize()
candidate
Please let me know if you found this helpful! Also, depending on your use case it might be beneficial to put much more compute resources into the acquisition function optimization.
@sumny I've reinstalled the current dev version mlr3mbo_0.2.1.9000 and ??default_gp
was found in the docs. But not found when running:
surrogate = srlrn(list(default_gp(), default_gp()), archive = archive)
Error in default_gp(): could not find function "default_gp"
Traceback:
1. srlrn(list(default_gp(), default_gp()), archive = archive)
2. SurrogateLearner$new(learner = learner, archive = archive, x_cols = x_cols,
. y_col = y_col)
3. initialize(...)
4. .__SurrogateLearner__initialize(self = self, private = private,
. super = super, learner = learner, archive = archive, x_cols = x_cols,
. y_col = y_col)
5. assert_learner(learner)
6. assert_class(learner, "Learner", .var.name = .var.name)
7. checkClass(x, classes, ordered, null.ok)
@hududed Did you restart your R session after installing mlr3mbo 0.2.1.9000 (github main branch)?
library(mlr3mbo)
default_gp()
<LearnerRegrKM:regr.km>
* Model: -
* Parameters: covtype=matern5_2, optim.method=gen, control=<list>,
nugget.stability=1e-08
* Packages: mlr3, mlr3learners, DiceKriging
* Predict Types: [response], se
* Feature Types: logical, integer, numeric
* Properties: -
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux
Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblas.so.0.3; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Berlin
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mlr3mbo_0.2.1.9000
loaded via a namespace (and not attached):
[1] digest_0.6.33 backports_1.4.1 R6_2.5.1
[4] codetools_0.2-19 lgr_0.4.4 parallel_4.3.2
[7] spacefillr_0.3.2 rgenoud_5.9-0.3 mlr3tuning_0.19.2
[10] palmerpenguins_0.1.1 bbotk_0.7.3 mlr3misc_0.13.0
[13] parallelly_1.36.0 mlr3learners_0.5.6 future_1.33.0
[16] mlr3_0.17.0 data.table_1.14.10 compiler_4.3.2
[19] paradox_0.11.1 tools_4.3.2 globals_0.16.2
[22] checkmate_2.3.1 listenv_0.9.0 Rcpp_1.0.11
[25] crayon_1.5.2 DiceKriging_1.6.0 uuid_1.1-1
@sumny this worked! btw this is on google colab R which is great. So the candidate would be one aggregated value acq_ehvi
(ignore the actual values I used a diff parameter set):
x1 x2 x3 x_domain acq_ehvi .already_evaluated
<int> <int> <int> <list> <dbl> <lgl>
4592 18437 3 4592, 18437, 3 216.9517 FALSE
Is there a way to retrieve back y1 and y2 from this? or I guess for a multi-point proposal I can just get the min/max of each y column in the archive
@hududed In my example, we would try to find the candidate point that maximizes the Expected Hypervolume Improvement (which measures how much we can expect that point - given the posterior predictions of the surrogate model(s) - to improve the Hypervolume of the current Pareto front.
You can get the "raw" predictions of the surrogate model (i.e., the mean and variance or standard deviation prediction for y1
and y2
for that candidate point) by querying the surrogate model - after you have performed an update on the current archive:
surrogate$predict(candidate)
$y1
mean se
1: 1.933445 0.4387137
$y2
mean se
1: 0.9370323 0.4398714
@sumny Ah ok that's exactly what I was looking for - I can close with that.
Just as a side-note, does mlr3 have some syntactic sugar to plot the Pareto front?
I have a data table something like:
And wanted to update the surrogate to optimize y1, y2:
but am getting the error:
How do I initiate the task for multiple objectives?