tidymodels / dials

Tools for creating tuning parameter values
https://dials.tidymodels.org/
Other
113 stars 27 forks source link

Tuning engine parameters that are lists, e.g. parms in rpart() #150

Closed CGlemser closed 3 years ago

CGlemser commented 4 years ago

Hi guys,

Following this blogpost (https://www.tidyverse.org/blog/2020/07/tune-0-1-1/#tuning-engine-parameters), I've tried to tune the split criterion in rpart, so tuning whether I should use "gini" or "information". I'm either still doing something wrong or the parameters you can set in parms are not yet dials parameter objects that I can tune? If the latter is true, I would think it could be useful to other users as well :)

Here's my reprex - I've included cost_complexity as a parameter to tune, as well, so I could see the differences:

library(rpart)
library(tidymodels)
#> -- Attaching packages --------------------------------------------------------------------------------------------------------------- tidymodels 0.1.1 --
#> v broom     0.7.0      v recipes   0.1.13
#> v dials     0.0.9      v rsample   0.0.8 
#> v dplyr     1.0.2      v tibble    3.0.3 
#> v ggplot2   3.3.2      v tidyr     1.1.2 
#> v infer     0.5.3      v tune      0.1.1 
#> v modeldata 0.0.2      v workflows 0.2.0 
#> v parsnip   0.1.3      v yardstick 0.0.7 
#> v purrr     0.3.4
#> Warning: Paket 'rsample' wurde unter R Version 4.0.3 erstellt
#> -- Conflicts ------------------------------------------------------------------------------------------------------------------ tidymodels_conflicts() --
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x dials::prune()   masks rpart::prune()
#> x recipes::step()  masks stats::step()

# define model with two parameters to tune
mod_tree <-
  decision_tree(
    cost_complexity = tune()
  ) %>%
  set_mode("classification") %>%
  set_engine("rpart",
    parms = list(split = tune()))

# define workflow
wf_tree <- workflow() %>%
  add_formula(Kyphosis~.) %>%
  add_model(mod_tree)

# this will give a warning
parameters(mod_tree)
#> Collection of 2 parameters for tuning
#> 
#>       identifier            type    object
#>  cost_complexity cost_complexity nparam[+]
#>            parms           parms       lgl
#> Warning: Unknown or uninitialised column: `identifier`.
#> One needs a `param` object: ''

# create cross-validation data set
dat_cv <- vfold_cv(kyphosis, 5)

# define grid
grid_tree <- expand.grid(
  cost_complexity = c(.01, .001),
  split = c("information", "gini"))

# run tuning: this will throw an error
res_tree <-
  wf_tree %>%
  tune_grid(
    resamples = dat_cv,
    grid = grid_tree,
  )
#> Error: The provided `grid` has the following parameter columns that have not been marked for tuning by `tune()`: 'split'.
juliasilge commented 4 years ago

We've added engine-specific parameters for ranger, randomForest, earth, and C5.0, but have not set up any tunable engine-specific parameters like this for rpart. You can set them, as I imagine you have already found out:

library(tidymodels)
data(kyphosis, package = "rpart")

mod_tree <-
  decision_tree(
    cost_complexity = tune()
  ) %>%
  set_mode("classification") %>%
  set_engine("rpart", parms = list(split = "gini"))

wf_tree <- workflow() %>%
  add_formula(Kyphosis ~ .) %>%
  add_model(mod_tree)

parameters(mod_tree)
#> Collection of 1 parameters for tuning
#> 
#>       identifier            type    object
#>  cost_complexity cost_complexity nparam[+]

dat_cv <- vfold_cv(kyphosis, 5)
tune_grid(wf_tree, resamples = dat_cv)
#> 
#> Attaching package: 'rpart'
#> The following object is masked from 'package:dials':
#> 
#>     prune
#> # Tuning results
#> # 5-fold cross-validation 
#> # A tibble: 5 x 4
#>   splits          id    .metrics          .notes          
#>   <list>          <chr> <list>            <list>          
#> 1 <split [64/17]> Fold1 <tibble [20 × 5]> <tibble [0 × 1]>
#> 2 <split [65/16]> Fold2 <tibble [20 × 5]> <tibble [0 × 1]>
#> 3 <split [65/16]> Fold3 <tibble [20 × 5]> <tibble [0 × 1]>
#> 4 <split [65/16]> Fold4 <tibble [20 × 5]> <tibble [0 × 1]>
#> 5 <split [65/16]> Fold5 <tibble [20 × 5]> <tibble [0 × 1]>

Created on 2020-10-22 by the reprex package (v0.3.0.9001)

But you cannot yet tune any rpart engine-specific parameters. We'll need to get that set up!

CGlemser commented 4 years ago

yes, thanks for the quick response! Then we will set and compare the split criteria manually for now :)

github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.