mlr-org / paradox

ParamHelpers Next Generation
https://paradox.mlr-org.com
GNU Lesser General Public License v3.0
28 stars 7 forks source link

ParamSet::ids as C function for improved speed -- DONT MERGE YET #406

Open berndbischl opened 2 months ago

berndbischl commented 2 months ago

see title unit tests already run i am still evaluating a bit

berndbischl commented 2 months ago

@mb706 has a "competing" PR here #405 his PR is only 3 lines of code -- but only fixes the problem for the special case of 1 tag. admittedly it is way shorter -- and probably okishly fast for this case,

this is what i see on my x1 if i compare

ℹ Loading paradox
Unit: microseconds
 expr    min      lq     mean  median      uq    max neval
 ids1  3.240  3.4200  3.85839  3.5335  3.6540 30.262   100
 ids2 15.965 16.4865 17.33981 16.7305 16.8535 43.635   100
> source("test.R")
ℹ Loading paradox
Unit: microseconds
 expr     min       lq       mean   median        uq      max neval
 ids1   3.700    4.801    8.92382   10.936   11.8575   36.694   100
 ids2 955.274 1024.088 1072.78218 1038.760 1057.7805 4030.501   100

ids2 = martin

i run this

lrn1 = lrn("classif.rpart")
ps = lrn1$param_set

mb = microbenchmark(
  ids1 = ps$ids(tags = c("train", "predict)")),
  ids2 = ps$ids2(tags = c("train", "predict"))
)
print(mb)

(first benchmark is without the 'predict')

berndbischl commented 2 months ago

my code should only have this relevant "issue" -- which I suspect is fine...

// FIXME: i am not sure if we want to index cols by nr here...

some comments are about speed improvement -- one can do A LOT MORE things to speed up my "stupid search" . but i suspect this is really not worth it anymore, and like this the code is very robust and readable.