berndbischl commented 1 year ago

bmr = benchmark(....)

bma = as_benchmark_aggr(bmr) bms = as_benchmark_score(bmr) bml = as_benchmark_loss(bmr)

friedman / blme

autoplot(bma) ---> plots without tests: mean, box

we remove type = "mean", useless worse version of a boxplot
boxplots "accross tasks" MAYBE MAYBE ok SOMETIMES, but we also want task-facetting

infer_friedman_global(bma, measure) --> s3: (friedman_global, infer)

infer_friedman_posthoc(bma, measure, global = T/F) --> s3: (friedman_posthoc, infer)

autoplot.friedmanpostdoc(ifp, type = c("cd", "fn")) ----------> does CD plot comment: we keep the posthoc-matrix plot only keep one CD plot, MAYBE have some MILD, SIMPLE args to configure its style

infer_blme: similar as above....!

we probably want some form of standardization if we have multiple tasks

average distance to the minimum
0 should represent the perf of a "baseline". eg the featureless learner
(y - ymin) / (ymin - ymax). also think about r_squared like versions

berndbischl commented 1 year ago

containers support multiple measures plots and tests support 1 measure. if none is passed we use the first one in the container

berndbischl commented 1 year ago

infer_blme(bms) --> (blme, infer) autoplot(iblme, type = "ridgeline", "interval", "halfeye")

berndbischl commented 1 year ago

maybe add (exposed) helper that draws from the posterior of the BLME

mlr-org / mlr3benchmark

refatoring -- ws 2023 discussion with john, seb, bb #37

friedman / blme