mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 405 forks source link

xgboost: multi:prob gives error #907

Closed ghost closed 8 years ago

ghost commented 8 years ago

I would like to apply xgboost on a classification problem with 3 classes:

task = makeClassifTask(id = "iris", data = iris, target = "Species")
lrn = makeLearner("classif.xgboost", 
    predict.type = "prob", 
    par.vals = list(objective = "multi:softprob", num_class = 3))

This gives an error:

Error in setHyperPars2.Learner(learner, insert(par.vals, args)) : 
  classif.xgboost: Setting parameter num_class without available description object!
You can switch off this check by using configureMlr!

Rather than changing configureMlr, would like to understand what the problem is and locate the "description object" that the error message refers to, but can't find it.

sessionInfo:

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                  LC_TIME=C                     LC_COLLATE=en_US.UTF-8        LC_MONETARY=en_US.UTF-8      
 [6] LC_MESSAGES=en_US.UTF-8       LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8           LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
 [1] stats4    grid      parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] e1071_1.6-7                 caret_6.0-68                party_1.0-25                strucchange_1.5-1           sandwich_2.3-4             
 [6] zoo_1.7-13                  modeltools_0.2-21           mvtnorm_1.0-5               RWeka_0.4-27                performanceEstimation_1.0.2
[11] C50_0.1.0-24                MLmetrics_1.1.1             glmnet_2.0-5                Matrix_1.2-6                parallelMap_1.3            
[16] mlr_2.9                     ParamHelpers_1.7            BBmisc_1.9                  decrapr_0.1.0               xgboost_0.4-3              
[21] ranger_0.4.0                ROCR_1.0-7                  gplots_3.0.1                ineq_0.2-13                 pacman_0.4.1               
[26] knitr_1.13                  rbokeh_0.4.2                highcharter_0.3.1.9999      extrafont_0.17              devtools_1.11.1            
[31] gridExtra_2.2.1             DMwR_0.4.1                  checkmate_1.7.4             testthat_1.0.2              Hmisc_3.17-4               
[36] Formula_1.2-1               survival_2.39-4             lattice_0.20-33             viridis_0.3.4               doMC_1.3.4                 
[41] iterators_1.0.8             foreach_1.4.3               magrittr_1.5                readxl_0.1.1                ggplot2_2.1.0              
[46] lubridate_1.5.6             stringr_1.0.0               data.table_1.9.6           

loaded via a namespace (and not attached):
 [1] minqa_1.2.4         TH.data_1.0-7       colorspace_1.2-6    pryr_0.1.2          class_7.3-14        rsconnect_0.4.2.2   ggdendro_0.1-20     hexbin_1.27.1      
 [9] MatrixModels_0.4-1  coin_1.1-2          codetools_0.2-14    splines_3.2.2       rlist_0.4.6.1       jsonlite_0.9.20     nloptr_1.0.4        pbkrtest_0.4-2     
[17] rJava_0.9-8         Rttf2pt1_1.3.3      cluster_2.0.4       shiny_0.13.2        compiler_3.2.2      httr_1.1.0          backports_1.0.2     assertthat_0.1     
[25] lazyeval_0.1.10     quantreg_5.21       acepack_1.3-3.3     htmltools_0.3.5     tools_3.2.2         ggvis_0.4.2         igraph_1.0.1        partykit_1.0-5     
[33] gtable_0.2.0        reshape2_1.4.1      dplyr_0.4.3         maps_3.1.0          Rcpp_0.12.5         gdata_2.17.0        ape_3.4             nlme_3.1-128       
[41] extrafontdb_1.0     lme4_1.1-12         mime_0.4            gtools_3.5.0        RWekajars_3.9.0-1   MASS_7.3-45         scales_0.4.0        SparseM_1.7        
[49] RColorBrewer_1.1-2  quantmod_0.4-5      memoise_1.0.0       rpart_4.1-10        latticeExtra_0.6-28 stringi_1.0-1       highr_0.6           TTR_0.23-1         
[57] caTools_1.17.1      chron_2.3-47        bitops_1.0-6        purrr_0.2.1         htmlwidgets_0.6     labeling_0.3        plyr_1.8.3          R6_2.1.2           
[65] multcomp_1.4-5      DBI_0.4-1           mgcv_1.8-12         gistr_0.3.6         foreign_0.8-66      withr_1.0.1         xts_0.9-7           abind_1.4-3        
[73] nnet_7.3-12         car_2.1-0           crayon_1.3.1        KernSmooth_2.23-15  rmarkdown_0.9.6     digest_0.6.9        xtable_1.8-2        tidyr_0.4.1        
[81] httpuv_1.3.3        munsell_0.4.3       viridisLite_0.1.3  
> 
PhilippPro commented 8 years ago

This is set automatically in mlr, so just omit the num_class part:

lrn = makeLearner("classif.xgboost", predict.type = "prob", 
par.vals = list(objective = "multi:softprob"))

See line 69 in https://github.com/mlr-org/mlr/blob/master/R/RLearner_classif_xgboost.R.

schiffner commented 8 years ago

Hi,

Rather than changing configureMlr, would like to understand what the problem is and locate the "description object" that the error message refers to, but can't find it.

Just for clarification about the error message and the description object: As you probably know every learner has a parameter set (accessible by getParamSet, see below) that contains descriptions of all parameters you might want to set or tune. If you are trying to set a parameter which is not registered in the parameter set you get the error message about the missing description object. In your case it is missing for good reason, as explained by Philipp.

If it's missing by mistake you can turn this check off via configureMlr or the config argument of makeLearner (see also http://mlr-org.github.io/mlr-tutorial/release/html/configureMlr/index.html#example-turning-off-parameter-checking).

lrn = makeLearner("classif.xgboost", predict.type = "prob", 
+ par.vals = list(objective = "multi:softprob"))
getParamSet(lrn)
                      Type len         Def          Constr Req Tunable Trafo
booster           discrete   -      gbtree gbtree,gblinear   -    TRUE     -
silent             integer   -           0     -Inf to Inf   -    TRUE     -
eta                numeric   -         0.3        0 to Inf   -    TRUE     -
gamma              numeric   -           0        0 to Inf   -    TRUE     -
max_depth          integer   -           6        0 to Inf   -    TRUE     -
min_child_weight   numeric   -           1        0 to Inf   -    TRUE     -
subsample          numeric   -           1          0 to 1   -    TRUE     -
...
berndbischl commented 8 years ago

I have updated the note of classif.xgboost to make this (even) clearer

note = "All settings are passed directly, rather than through xgboost's params argument. nrounds has been set to 1 by default. num_class is set internally, so do not set this manually."

closing.