mlr-org / mlr3fselect

Feature selection package of the mlr3 ecosystem.
https://mlr3fselect.mlr-org.com/
GNU Lesser General Public License v3.0
20 stars 4 forks source link

oob_error measure doesn't work with AutoFSelector? #67

Closed bblodfon closed 1 year ago

bblodfon commented 1 year ago

Hi,

I want to run the RFE algorithm and in each iteration estimate performance using the oob_error of the learner (see below).

This might be related to a recent fix - I have the latest mlr3 version installed where the fix is implemented though, so it may be something else:

library(mlr3verse)
#> Loading required package: mlr3

task = tsk('pima')
task$missings()
#> diabetes      age  glucose  insulin     mass pedigree pregnant pressure 
#>        0        0        5      374       11        0        0       35 
#>  triceps 
#>      227

imp_regr.tree = po('imputelearner', learner = lrn('regr.rpart'), # regr => integer or numeric features
  context_columns = selector_all(), # use all features for training (default)
  affect_columns = selector_missing())

task = imp_regr.tree$train(list(task))[[1L]]
task$missings()
#> diabetes      age pedigree pregnant  glucose  insulin     mass pressure 
#>        0        0        0        0        0        0        0        0 
#>  triceps 
#>        0

# learner has `importance` and `oob_error` properties
learner = lrn('classif.ranger', num.threads = 10, num.trees = 50, importance = 'permutation')

# so I can do the following
learner$train(task)
learner$importance()
#>      glucose         mass      insulin          age     pregnant      triceps 
#> 5.583057e-02 2.455673e-02 2.345968e-02 1.642438e-02 1.468721e-02 7.899497e-03 
#>     pedigree     pressure 
#> 6.897118e-03 5.960976e-05
learner$oob_error()
#> [1] 0.2526042

at = AutoFSelector$new(
  learner = learner,
  resampling = rsmp('insample'), # TRAIN == TEST
  measure = msr('oob_error'),
  #measure = msr('classif.acc'),
  terminator = trm('none'), # necessary to set (but is disregarded)
  fselector = fs('rfe', feature_fraction = 0.8, n_features = 2),
  store_models = TRUE, # this is needed!
)
at$train(task)
#> INFO  [11:43:13.716] [bbotk] Starting to optimize 8 parameter(s) with '<FSelectorRFE>' and '<TerminatorNone>'
#> INFO  [11:43:13.742] [bbotk] Evaluating 1 configuration(s)
#> INFO  [11:43:13.875] [mlr3] Running benchmark with 1 resampling iterations
#> INFO  [11:43:13.914] [mlr3] Applying learner 'select.classif.ranger' on task 'pima' (iter 1/1)
#> INFO  [11:43:14.041] [mlr3] Finished benchmark
#> Error in learner$oob_error(): attempt to apply non-function
at$archive
#> NULL
at$fselect_result
#> NULL

Created on 2023-01-26 with reprex v2.0.2

be-marc commented 1 year ago

We use internally PipeOpSelect to train a learner on a feature subset. The graph learner that results has no $oob_error() method. You can use branch #68 for now, which works without pipelines. Thanks for reporting the bug!

@mb706 Is there a way to fix this? If you would implement the $base_learner() method, maybe you could provide $oob_error() over it.

be-marc commented 1 year ago

Removed pipelines.