mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 405 forks source link

Feature selection - "argument 'method' matched by multiple actual arguments" error when tuning with mlrMBO #2377

Closed notiv closed 6 years ago

notiv commented 6 years ago

I'm using mlrMBO to tune an xgboost model with a few wrappers. After adding the following wrapper:

lrn <- mlr::makeFilterWrapper(lrn, method = 'ranger.permutation')

I get the following error:

Error in (function (task, method = "randomForestSRC.rfsrc", fval = NULL,  : 
  formal argument "method" matched by multiple actual arguments 

This issue looks similar to the following resolved issue: https://github.com/mlr-org/mlr/issues/1066

Any idea what could be wrong?

Session Info (part): R 3.5.1 mlrMBO_1.1.1
mlr_2.12.1
ParamHelpers_1.10

P.S. If a reproducible example could be useful, I'll try to provide one. I won't be sad, if we can get away without one ;-)

larskotthoff commented 6 years ago

A reproducible example would be great...

notiv commented 6 years ago

Ok, here you go (not all steps and values make sense for the titanic dataset, but you get the point):

library(mlrMBO)
library(dplyr)
library(titanic)

data <- titanic_train
sample <- sample.int(n = nrow(data), size = floor(.7*nrow(data)), replace = F)

train <- titanic_train[sample, ] %>% select(Pclass, Sex, Age, SibSp, Fare, Survived) %>% mutate(Sex = ifelse(Sex == 'male', 0, 1))

mlr::configureMlr(on.par.without.desc = "quiet")
lrn <- mlr::makeLearner(cl = 'classif.xgboost',
                        predict.type = "prob",
                        fix.factors.prediction = TRUE,
                        tree_method = 'exact')

lrn <- mlr::makeFilterWrapper(lrn, method = 'chi-squared')

lrn <- mlr::makeImputeWrapper(lrn,
                              classes = list(integer = mlr::imputeMedian(),
                                             numeric = mlr::imputeHist(),
                                             factor = mlr::imputeMode()),
                              dummy.classes = "factor")

classif.task <- mlr::makeClassifTask(data = train,
                                     target = "Survived",
                                     positive = "1")

eval_metrics <- list(auc, fpr, tpr, ppv, f1, acc, ber, mmce, timetrain)

param_set <- ParamHelpers::makeParamSet(
  ParamHelpers::makeIntegerParam(id = "nrounds", lower = 10, upper = 100, default = 20),
  ParamHelpers::makeNumericParam(id = "eta", lower = -7, upper = -5, default = -6, trafo = function(x) 2^x),
  ParamHelpers::makeIntegerParam(id = "max_depth", lower = 3, upper = 5, default = 4),
  ParamHelpers::makeNumericParam(id = "colsample_bytree", lower = 0.4, upper = 0.9, default = 0.6),
  ParamHelpers::makeNumericParam(id = "subsample", lower = 0.4, upper = 0.9, default = 0.5),
  ParamHelpers::makeIntegerParam(id = "fw.abs", lower = 5, upper = 8, default = 6)
)

ctrl = makeMBOControl()
ctrl = setMBOControlTermination(ctrl, iters = 5)
tune.ctrl = makeTuneControlMBO(mbo.control = ctrl)

resInst <- mlr::makeResampleInstance("CV", iters = 4, task = classif.task)

res_ext <- mlr::tuneParams(lrn,
                           classif.task,
                           resInst,
                           par.set = param_set,
                           control = tune.ctrl,
                           measures = eval_metrics,
                           show.info = TRUE)

And here my (slightly different to the previously mentioned) sessionInfo:


other attached packages:
[1] mlrMBO_1.1.1      smoof_1.5.1       checkmate_1.8.5   BBmisc_1.11       bindrcpp_0.2.2    titanic_0.1.0     dplyr_0.7.5      
[8] mlr_2.12.1        ParamHelpers_1.10

loaded via a namespace (and not attached):
 [1] RWeka_0.4-38          tidyselect_0.2.4      DiceKriging_1.5.5     purrr_0.2.5           splines_3.5.0        
 [6] rJava_0.9-10          lattice_0.20-35       colorspace_1.3-2      htmltools_0.3.6       viridisLite_0.3.0    
[11] yaml_2.1.19           FSelector_0.31        survival_2.41-3       XML_3.98-1.11         plotly_4.7.1         
[16] rlang_0.2.1           pillar_1.2.3          glue_1.2.0            entropy_1.2.1         xgboost_0.71.2       
[21] plot3D_1.1.1          RColorBrewer_1.1-2    lhs_0.16              mco_1.0-15.1          bindr_0.1.1          
[26] plyr_1.8.4            munsell_0.4.3         gtable_0.2.0          htmlwidgets_1.2       misc3d_0.8-4         
[31] RWekajars_3.9.2-1     parallelMap_1.3       parallel_3.5.0        Rcpp_0.12.17          scales_0.5.0         
[36] backports_1.1.2       randomForestSRC_2.6.1 jsonlite_1.5          ggplot2_2.2.1         packrat_0.4.9-3      
[41] digest_0.6.15         stringi_1.2.2         RJSONIO_1.3-0         grid_3.5.0            tools_3.5.0          
[46] magrittr_1.5          lazyeval_0.2.1        tibble_1.4.2          randomForest_4.6-14   tidyr_0.8.1          
[51] pkgconfig_2.0.1       Matrix_1.2-14         data.table_1.11.4     assertthat_0.2.0      httr_1.3.1           
[56] R6_2.2.2              compiler_3.5.0       
larskotthoff commented 6 years ago

Thanks. The problem is down to the method argument you're giving when constructing the filter wrapper:

lrn <- mlr::makeFilterWrapper(lrn, method = 'chi-squared')

As far as I can tell, this argument does nothing (the filter wrapper doesn't have an argument like that, and the random forest used underneath doesn't have either). Removing it fixes the error for me.

notiv commented 6 years ago

Ah, stupid me... I got the parameters from the filterFeatures function. Just in case someone makes the same mistake, the correct call would be:

lrn <- mlr::makeFilterWrapper(lrn, fw.method = 'chi.squared')

Thanks Lars and sorry for wasting your time :-/