mlr-org / ParamHelpers

Helpers for parameters in black-box optimization, tuning and machine learning.
https://paramhelpers.mlr-org.com
Other
25 stars 9 forks source link

Rebuild `generateDesign` in R #222

Closed jakob-r closed 4 years ago

jakob-r commented 4 years ago

Why?

So I implemented this in R because we sometimes faced errors like

Error in generateDesign(control$infill.opt.focussearch.points, ps.local,  :
  REAL() can only be applied to a 'numeric', not a 'NULL'
Calls: mbo ... measureTime -> force -> infill.opt.fun -> generateDesign

in mlrMBO [1] [2] plus some mails.

I was not able to reproduce this error or understand the C code. That's why this desicion

performance comparision:

ps7 from test_generateDesign.R

PH CRAN:

                                expr      min       lq     mean   median
 generateDesign(500, par.set = ps7) 25.04107 27.73369 31.40577 30.11185
       uq      max neval
 32.33708 169.3077   100

PH DEV

                               expr      min      lq     mean   median       uq
 generateDesign(500, par.set = ps7) 123.7259 129.421 134.4459 132.0063 135.2913
      max neval
 271.3685   100

The bad thing here are the "unvectorized" requirements like requires = quote(identical(discB, "NR") && identical(disc, "b"))). If I take them away the speed becomes equal again. Anyhow, this should be really rare. @mllg Do you see any easy tweak here?

ps3 from test_generateDesign.R

PH CRAN
> microbenchmark::microbenchmark(generateDesign(5000, par.set = ps3))
Unit: milliseconds
                                expr      min       lq     mean   median
 generateDesign(5000, par.set = ps3) 13.85312 14.81979 18.52445 17.14613
       uq      max neval
 17.66745 150.1679   100

PH DEV
> microbenchmark::microbenchmark(generateDesign(5000, par.set = ps3))
Unit: milliseconds
                                expr      min       lq     mean   median
 generateDesign(5000, par.set = ps3) 13.91937 14.37524 15.21534 14.54776
       uq      max neval
 14.84449 21.19939   100
jakob-r commented 4 years ago

Theoretically we could still use the C function c_trafo_and_set_dep_to_na which is still in use in generateGridDesign and generateDesignOfDefaults to get back to the original performance. I just removed it to get rid of all C code but actually this part does not produce problems.

berndbischl commented 4 years ago

In general thanks. I am a little bit scared to change this when we are in maintenance mode only. Can you at least have @mllg read and check this pls

jakob-r commented 4 years ago

We have a quite extensive test set and I would revdep check this agains mlr and mlrMBO at least.

Also: Any opinion about keeping c_trafo_and_set_dep_to_na?

jakob-r commented 4 years ago

I switched back to use c_trafo_and_set_dep_to_na. Now the performance has increased again:

ps7

> microbenchmark::microbenchmark(generateDesign(500, par.set = ps7))
Unit: milliseconds
                               expr      min      lq     mean   median       uq
 generateDesign(500, par.set = ps7) 23.68374 25.7048 28.17113 27.21066 30.06211
      max neval
 37.95199   100

ps3 (no change expected)

> microbenchmark::microbenchmark(generateDesign(5000, par.set = ps3))
Unit: milliseconds
                                expr     min       lq     mean   median
 generateDesign(5000, par.set = ps3) 13.7141 14.12697 15.12258 14.29656
       uq      max neval
 15.70729 21.49546   100
jakob-r commented 4 years ago

Strange that travis doesn't see that, but now I get a mistake with rcheck:

> test_check("ParamHelpers")
Error in if (is.character(value[[1L]])) { : 
  missing value where TRUE/FALSE needed
Calls: test_check ... tryCatchOne -> doTryCatch -> tryCatchList -> tryCatchOne
Error in if (is.null(value[[1L]])) { : 
  missing value where TRUE/FALSE needed
Calls: test_check ... <Anonymous> -> tryCatch -> tryCatchList -> tryCatchOne
Error in if (!is.null(ns)) ns else tryCatch(loadNamespace(name), error = function(e) { : 
  missing value where TRUE/FALSE needed
Calls: test_check -> test_package_dir -> test_dir -> ..getNamespace
Error in if (is.null(value[[1L]])) { : 
  missing value where TRUE/FALSE needed
Calls: test_check ... <Anonymous> -> tryCatch -> tryCatchList -> tryCatchOne
Execution halted

I have no idea how to debug this, :disappointed:

jakob-r commented 4 years ago

Ok... completely removed the C function again because of a buggy behavior and some problematic / forbidden calls.

New performance:

> microbenchmark::microbenchmark(generateDesign(500, par.set = ps7))
Unit: milliseconds
                               expr      min       lq     mean   median      uq
 generateDesign(500, par.set = ps7) 118.8355 124.7426 143.6804 128.0835 135.184
      max neval
 298.1425   100

> microbenchmark::microbenchmark(generateDesign(5000, par.set = ps3))
Unit: milliseconds
                                expr      min       lq     mean   median
 generateDesign(5000, par.set = ps3) 13.74017 14.16432 16.48244 14.71859
       uq      max neval
 15.55464 130.4775   100