mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.65k stars 405 forks source link

Discrepancy between `listFilterMethods()` and https://mlr.mlr-org.com/articles/tutorial/filter_methods.html#current-methods #2518

Closed missuse closed 5 years ago

missuse commented 5 years ago

Thank you for all your hard work, mlr is a wonderful package.

The page at https://mlr.mlr-org.com/articles/tutorial/filter_methods.html#current-methods which I trust is current lists several filter methods from the package praznik as available.

However when I run:

listFilterMethods(tasks = TRUE)

                           id         package                                     desc
1                  anova.test                 ANOVA Test for binary and multiclass ...
2                         auc                 AUC filter for binary classification ...
3                    carscore            care                               CAR scores
4          cforest.importance           party Permutation importance of random fore...
5                 chi.squared       FSelector Chi-squared statistic of independence...
6                  gain.ratio       FSelector Entropy-based gain ratio between feat...
7            information.gain       FSelector Entropy-based information gain betwee...
8                kruskal.test                 Kruskal Test for binary and multiclas...
9          linear.correlation                 Pearson correlation between feature a...
10                       mrmr           mRMRe Minimum redundancy, maximum relevance...
11                       oneR       FSelector                    oneR association rule
12     permutation.importance                 Aggregated difference between feature...
13    randomForest.importance    randomForest Importance based on OOB-accuracy or n...
14      randomForestSRC.rfsrc randomForestSRC Importance of random forests fitted i...
15 randomForestSRC.var.select randomForestSRC Minimal depth of / variable hunting v...
16            ranger.impurity          ranger Variable importance based on ranger i...
17         ranger.permutation          ranger Variable importance based on ranger p...
18           rank.correlation                 Spearman's correlation between featur...
19                     relief       FSelector                         RELIEF algorithm
20    symmetrical.uncertainty       FSelector Entropy-based symmetrical uncertainty...
21     univariate.model.score                 Resamples an mlr learner for each inp...
22                   variance                                 A simple variance filter

the output does not contain these filter methods. Consequently when I try to run makeFilterWrapper with paraznik filter methods it does not work:

lrn <- makeLearner("classif.xgboost", predict.type = "prob")
lrn <- makeFilterWrapper(lrn, fw.method = "praznik_MRMR")
Error in makeFilterWrapper(lrn, fw.method = "praznik_MRMR") : 
Assertion on 'fw.method' failed: Must be element of set {'anova.test','auc','carscore','cforest.importance','chi.squared','gain.ratio','information.gain','kruskal.test','linear.correlation','mrmr','oneR','permutation.importance','randomForest.importance','randomForestSRC.rfsrc','randomForestSRC.var.select','ranger.impurity','ranger.permutation','rank.correlation','relief','rf.importance','rf.min.depth','symmetrical.uncertainty','univariate','univariate.model.score','variance'}, but is 'praznik_MRMR'.

Are praznik filter methods available in mlr 2.13?

Thank you

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] FSelector_0.31       mlrMBO_1.1.2         smoof_1.5.1          checkmate_1.8.5      BBmisc_1.11          mlr_2.13             ParamHelpers_1.11    praznik_5.0.0       
 [9] RevoUtils_11.0.1     RevoUtilsMath_11.0.0

loaded via a namespace (and not attached):
 [1] nlme_3.1-137        bitops_1.0-6        lubridate_1.7.4     RColorBrewer_1.1-2  httr_1.4.0          tools_3.5.1         backports_1.1.3     R6_2.2.2           
 [9] rpart_4.1-13        lazyeval_0.2.1      colorspace_1.3-2    nnet_7.3-12         withr_2.1.2         tidyselect_0.2.5    mco_1.0-15.1        compiler_3.5.1     
[17] parallelMap_1.4     glmnet_2.0-16       plotly_4.8.0        entropy_1.2.1       caTools_1.17.1.1    scales_1.0.0        randomForest_4.6-14 plot3D_1.1.1       
[25] stringr_1.3.1       digest_0.6.18       wdman_0.2.4         pkgconfig_2.0.2     htmltools_0.3.6     lhs_0.16            RWekajars_3.9.2-1   htmlwidgets_1.3    
[33] rlang_0.3.1         rstudioapi_0.9.0    bindr_0.1.1         generics_0.0.2      jsonlite_1.5        dplyr_0.7.8         ModelMetrics_1.2.2  RCurl_1.95-4.11    
[41] magrittr_1.5        Matrix_1.2-14       Rcpp_1.0.0          munsell_0.5.0       RWeka_0.4-38        stringi_1.2.4       yaml_2.2.0          MASS_7.3-50        
[49] RJSONIO_1.3-1.1     plyr_1.8.4          recipes_0.1.4       grid_3.5.1          misc3d_0.8-4        parallel_3.5.1      crayon_1.3.4        semver_0.2.0       
[57] lattice_0.20-35     splines_3.5.1       pillar_1.3.1        xgboost_0.71.2      reshape2_1.4.3      codetools_0.2-15    stats4_3.5.1        fastmatch_1.1-0    
[65] XML_4.0-0           glue_1.3.0          data.table_1.11.8   foreach_1.5.0       gtable_0.2.0        openssl_1.1         purrr_0.2.5         tidyr_0.8.2        
[73] assertthat_0.2.0    ggplot2_3.1.0       gower_0.1.2         binman_0.1.1        prodlim_2018.04.18  h2o_3.22.1.1        class_7.3-14        survival_2.42-3    
[81] viridisLite_0.3.0   timeDate_3043.102   tibble_1.4.2        rJava_0.9-10        iterators_1.0.10    RSelenium_1.7.5     bindrcpp_0.2.2      lava_1.6.4         
[89] caret_6.0-81        ipred_0.9-8  

When I just copy:

praznik_filter = function(fun) {
  force(fun)

  function(task, nselect, ...) {
    requireNamespace("praznik")
    fun = getFromNamespace(fun, ns = "praznik")

    data = getTaskData(task)
    X = data[getTaskFeatureNames(task)]
    Y = data[[getTaskTargetNames(task)]]
    k = max(min(nselect, ncol(X)), 1L)
    selected = names(fun(X, Y, k = k)$selection)
    score = setNames(rev(seq_along(selected)) / length(selected), selected)

    if (length(score) < ncol(X)) {
      unscored = sample(setdiff(names(X), names(score)))
      score = c(score, setNames(rep.int(NA_real_, length(unscored)), unscored))
    }

    score
  }
}

praznik_MRMR <- makeFilter(
  name = "praznik_MRMR",
  desc = "Minimum redundancy maximal relevancy filter",
  pkg = "praznik",
  supported.tasks = "classif",
  supported.features = c("numerics", "factors", "integer", "character", "logical"),
  fun = praznik_filter("MRMR")
)

from mlr github into my session the code works:

lrn <- makeLearner("classif.xgboost", predict.type = "prob")
lrn <- makeFilterWrapper(lrn, fw.method = "praznik_MRMR")

however when calling resample on such a wrapper an error is generated:

Assertion on 'method' failed: Must be element of set {'anova.test','auc','carscore','cforest.importance','chi.squared','gain.ratio','information.gain','kruskal.test','linear.correlation','mrmr','oneR','permutation.importance','randomForest.importance','randomForestSRC.rfsrc','randomForestSRC.var.select','ranger.impurity','ranger.permutation','rank.correlation','relief','rf.importance','rf.min.depth','symmetrical.uncertainty','univariate','univariate.model.score','variance'}, but is 'praznik_MRMR'.

missuse commented 5 years ago

This problem is resolved after installation of the mlr master on github. I suppose praznik filter methods were implemented after the latest CRAN version. It's nice to know tutorials are up to date.