mlr-org / paradox

ParamHelpers Next Generation
https://paradox.mlr-org.com
GNU Lesser General Public License v3.0
28 stars 7 forks source link

moving irace parameters implementation to Paradox #377

Open MLopez-Ibanez opened 1 year ago

MLopez-Ibanez commented 1 year ago

Description

I'm considering replacing irace custom parameter representation with paradox. This will benefit irace since I expect your implementation to be of higher quality and it will benefit mlr3 and any other package that uses both paradox and irace by avoiding awkward conversions between types.

However, there are a few things that irace would need before being able to make the move (incomplete list!):

Reproducible example

library(irace)
parameters.table <- '
 # name       switch           type  values               [conditions (using R syntax)]
 algorithm    "--"             c     (as,mmas,eas,ras,acs)
 localsearch  "--localsearch " c     (0, 1, 2, 3)
 alpha        "--alpha "       r     (0.00, 5.00)
 beta         "--beta "        r     (0.00, 10.00)
 rho          "--rho  "        r     (0.01, 1.00)
 ants         "--ants "        i,log (5, 100)
 q0           "--q0 "          r     (0.0, 1.0)           | algorithm == "acs"
 rasrank      "--rasranks "    i     (1, "min(ants, 10)") | algorithm == "ras"
 elitistants  "--elitistants " i     (1, ants)            | algorithm == "eas"
 nnls         "--nnls "        i     (5, 50)              | localsearch %in% c(1,2,3)
 dlb          "--dlb "         c     (0, 1)               | localsearch %in% c(1,2,3)
 '
parameters <- readParameters(text=parameters.table)
str(parameters)

The above shows an advanced use of the switch for algorithm, various conditions and dependent bounds.

mb706 commented 1 year ago
  1. Sampling with given probabilities: paradox does sampling using the Sampler-class (documented to some degree here), sampling with different probabilities should be easy to implement there by subclassing.
  2. Ordinal parameters: Maybe this is already working the way it should, e.g. ps(x = p_fct(1:3))? The factor levels are strings ("1", "2", "3"), but since R auto-casts values to strings whenever they are involved they behave similar to integers. E.g.
    p <- ps(x = p_fct(1:3))
    p$levels$x  # character-type
    #> [1] "1" "2" "3"
    2 %in% p$levels$x  # behaves as if it were integers here
    #> TRUE

    But maybe I am not understanding your use case directly? E.g. do you need to be able to refer to categories both by name and by index/number?

  3. Dependent bounds: Our $deps only encode whether a component is "valid" (i.e. whether its value should influence the outcome at all) whenever another component has a certain value. What you are describing would currently be solved using the $trafo mechanism, which transforms parameters after sampling:
    p <- ps(param1 = p_dbl(0, 1), param2 = p_dbl(0, 1))
    p$trafo <- function(x, param_set) { x$param2 <- x$param2 * x$param1 * 2 ; x }
    sampled <- generate_design_random(p, 100)$transpose()
    plot(data.table::rbindlist(sampled))

    image In this case the ParamSet describes the way things are sampled and then transforms them, notice that both param1 and param2 range from 0 to 1 here. Whatever is being optimized here would, however, accept values of param2 that go up to 2, so it would have a slightly different ParamSet: p_domain <- ps(param1 = p_dbl(0, 1), param2 = p_dbl(0, 2)). In bbotk we therefore make the distinction between "search space" (p in this example) and "domain" of a function (p_domain here). The optimizer would only see the p ParamSet and would not need to worry about transformations or weird domain bounds; it can optimize in a cartesian product space. This also covers cases such as sampling from a 2-dimensional manifold in 3D, since the $trafo can also create new parameter components. We could probably add hierarchical dependencies of parameter bounds in paradox, but maybe the trafo-mechanism would also work for you? Are the variable parameter-bounds actually used in practice?

  4. Meta-data about parameters: Would not be a problem to add these.
mb706 commented 1 year ago

P.S. I would be very happy if we can integrate paradox with irace :-)

MLopez-Ibanez commented 1 year ago

@mb706 Thanks this is very useful!

  1. Sampling with given probabilities: paradox does sampling using the Sampler-class (documented to some degree here), sampling with different probabilities should be easy to implement there by subclassing.

Maybe it helps to explain a bit how irace generates new solutions. At each generation, the best (elite) configurations found so far are used to either define mean and std dev values for sampling numerical parameters from a truncated normal distribution or to define probabilities to sample values for categorical parameters.

I see that Sampler1DCateg already support probabilities, so that problem is solved.

Maybe I can use SamplerHierarchical for implementing the irace sampling by pre-calculating all probabilities and mean/sd values and asking it to sample just 1 point, then repeating those steps for each elite configuration. Does that sound like a good idea?

2. Ordinal parameters: Maybe this is already working the way it should, e.g. `ps(x = p_fct(1:3))`? The factor levels are strings (`"1"`, `"2"`, `"3"`), but since R auto-casts values to strings whenever they are involved they behave similar to integers. E.g.
   But maybe I am not understanding your use case directly? E.g. do you need to be able to refer to categories both by name _and_ by index/number?

Let me explain a bit more. In irace, if you declare a parameter with values ("very-low", "low", "medium", "high", "very-high") of ordinal type, then the values are treated as 1,2,3,4,5 for sampling purposes, that is, we sample values from a normal distribution, then round to the nearest integer and map back to the corresponding string. The goal is that if you have a good configuration with value "high", then we should sample more of "very-high" than "very-low".

3. Dependent bounds: Our `$deps` only encode whether a component is "valid" (i.e. whether its value should influence the outcome at all) whenever another component has a certain value. What you are describing would currently be solved using the `$trafo` mechanism, which transforms parameters after sampling:

Ah, OK, we also have something equivalent to $trafo in irace, which is much more general than dependent-domains, so this is useful information. However, using $trafo for dependent domains seems a bit error-prone, e.g., to produce values within (1,2*param1) as my first example , your code would need to be

p <- ps(param1 = p_dbl(0, 1), param2 = p_dbl(0, 1))
p$trafo <- function(x, param_set) { x$param2 <- 1 + (x$param2 * x$param1 * 2 - 1) ; x }

Would this example param3 = p_dbl(1, min(param1, param2)) be like?

p <- ps(param1 = p_dbl(0, 1), param2 = p_dbl(0, 1), param3 = p_dbl(0,1)
p$trafo <- function(x, param_set) { x$param2 <- 1 + (min(x$param2, x$param1)*x$param3 - 1) ; x }

This seems cumbersome for the user to write but relatively easy for paradox to generate.

We could probably add hierarchical dependencies of parameter bounds in paradox, but maybe the trafo-mechanism would also work for you? Are the variable parameter-bounds actually used in practice?

Yes, people use it (we have some questions about this in the google group but I think the major real-user of this for irace is the DEMIURGE project, who where the ones that implemented it in irace). However, our answer in the past was to ask them to write our equivalent of $trafo with you can specify in our scenario.txt (they just need to write a single function). However, this is not easy for many people.

Remember that almost every user of irace uses it via the command-line by providing a textual representation of parameters without knowing any R. Asking them to write a $trafo function by hand is cumbersome if they can specify what they want directly in the textual representation.

Another example (ACOTSP) is shown here: https://mlopez-ibanez.github.io/irace/reference/readParameters.html#ref-examples

4. Meta-data about parameters: Would not be a problem to add these.

Great! How would this be done?

MLopez-Ibanez commented 1 year ago

Hi, I'm still interested in this but I need some help to figure out how to implement the above using paradox.

MLopez-Ibanez commented 1 year ago

I have implemented a paradox-like interface to create parameters here: https://github.com/MLopez-Ibanez/irace/blob/parametersNew/R/parameters.R

With this interface, one can do the following:

digits <- 4L
x <- parametersNew(
       param_cat(name = "algorithm", values = c("as", "mmas", "eas", "ras", "acs"), switch = "--"),
       param_cat(name = "localsearch", values = c("0", "1", "2", "3"), switch = "--localsearch "),
       param_real(name = "alpha", lower = 0.0, upper=5.0, switch = "--alpha ", digits = digits),
       param_real(name = "beta", lower = 0.0, upper = 10.0, switch = "--beta ", digits = digits),
       param_real(name = "rho", lower = 0.01, upper = 1.00, switch = "--rho ", digits = digits),
       param_int(name = "ants", lower = 5, upper = 100, transf = "log", switch = "--ants "),
       param_real(name = "q0", switch = "--q0 ", lower=0.0, upper=1.0, condition = expression(algorithm == "acs")),
       param_int(name = "rasrank", switch = "--rasranks ", lower=1, upper=quote(min(ants, 10)), condition = 'algorithm == "ras"'),
       param_int(name = "elitistants", switch = "--elitistants ", lower=1, upper=expression(ants), condition = 'algorithm == "eas"'),
       param_int(name = "nnls", switch = "--nnls ", lower = 5, upper = 50, condition = expression(localsearch %in% c(1,2,3))),
       param_cat(name = "dlb",  switch = "--dlb ", values = c(0,1), condition = "localsearch %in% c(1,2,3)"))
})

Can the implementation of the above interface be completely replaced by paradox?