tidymodels / dials

Tools for creating tuning parameter values
https://dials.tidymodels.org/
Other
113 stars 27 forks source link

Recipe pre-processing step marked with tune() not being picked up by parameters() #112

Closed JHucker closed 4 years ago

JHucker commented 4 years ago

The Problem

I've been unable to take recipe pre-processing steps through tuning. To ensure that I have been adhering to correct use of the packages and functions, I followed https://www.tidymodels.org/learn/work/bayes-opt/ (with minor modifications) in the below example.

Within the below, it still appears that num_comp is not being picked up in by parameters() when the workflow object is passed to it.

Appreciate any assistance with my issue, loving the tidymodels packages so far.

Example

# https://www.tidymodels.org/learn/work/bayes-opt/

library(tidymodels)
#> -- Attaching packages -------------------------------------- tidymodels 0.1.0 --
#> v broom     0.5.6      v recipes   0.1.10
#> v dials     0.0.6      v rsample   0.0.6 
#> v dplyr     0.8.5      v tibble    3.0.0 
#> v ggplot2   3.3.0      v tune      0.1.0 
#> v infer     0.5.1      v workflows 0.1.1 
#> v parsnip   0.1.0      v yardstick 0.0.6 
#> v purrr     0.3.4
#> -- Conflicts ----------------------------------------- tidymodels_conflicts() --
#> x purrr::discard()  masks scales::discard()
#> x dplyr::filter()   masks stats::filter()
#> x dplyr::lag()      masks stats::lag()
#> x ggplot2::margin() masks dials::margin()
#> x recipes::step()   masks stats::step()
library(modeldata)

# Load data
data(cells)

set.seed(2369)
tr_te_split <- initial_split(cells %>% select(-case), prop = 3/4)
cell_train <- training(tr_te_split)
cell_test  <- testing(tr_te_split)

set.seed(1697)
folds <- vfold_cv(cell_train, v = 10)

cell_pre_proc <-
  recipe(class ~ ., data = cell_train) %>%
  step_YeoJohnson(all_predictors()) %>%
  step_normalize(all_predictors()) %>%
  step_pca(all_predictors(), num_comp = tune()) %>%
  step_downsample(class)

svm_mod <-
  svm_rbf(mode = "classification", cost = tune(), rbf_sigma = tune()) %>%
  set_engine("kernlab")

svm_wflow <-
  workflow() %>%
  add_model(svm_mod) %>%
  add_recipe(cell_pre_proc)

### This is where issues seem to arise i.e. num_comp is not present for
### tuning
svm_set <- parameters(svm_wflow)
svm_set
#> Collection of 2 parameters for tuning
#> 
#>         id parameter type object class
#>       cost           cost    nparam[+]
#>  rbf_sigma      rbf_sigma    nparam[+]

svm_set <- 
  svm_set %>% 
  update(num_comp = num_comp(c(0L, 20L)))
#> Error: At least one parameter does not match any id's in the set: 'num_comp'

set.seed(12)
search_res <-
  svm_wflow %>% 
  tune_bayes(
    resamples = folds,
    param_info = svm_set,
    initial = 5,
    iter = 5,
    metrics = metric_set(roc_auc),
    control = control_bayes(no_improve = 2, verbose = TRUE)
  )
#> 
#> >  Generating a set of 5 initial parameter results
#> x Fold01: recipe: Error in min(x$num_comp, length(col_names)): invalid 'type' (lan...
#> x Fold02: recipe: Error in min(x$num_comp, length(col_names)): invalid 'type' (lan...
#> x Fold03: recipe: Error in min(x$num_comp, length(col_names)): invalid 'type' (lan...
#> x Fold04: recipe: Error in min(x$num_comp, length(col_names)): invalid 'type' (lan...
#> x Fold05: recipe: Error in min(x$num_comp, length(col_names)): invalid 'type' (lan...
#> x Fold06: recipe: Error in min(x$num_comp, length(col_names)): invalid 'type' (lan...
#> x Fold07: recipe: Error in min(x$num_comp, length(col_names)): invalid 'type' (lan...
#> x Fold08: recipe: Error in min(x$num_comp, length(col_names)): invalid 'type' (lan...
#> x Fold09: recipe: Error in min(x$num_comp, length(col_names)): invalid 'type' (lan...
#> x Fold10: recipe: Error in min(x$num_comp, length(col_names)): invalid 'type' (lan...
#> Warning: All models failed in tune_grid(). See the `.notes` column.
#> v Initialization complete
#> 
#> Error: All of the models failed.

estimates <- 
  collect_metrics(search_res) %>% 
  arrange(.iter)
#> Error in collect_metrics(search_res): object 'search_res' not found

estimates
#> Error in eval(expr, envir, enclos): object 'estimates' not found

best_param <- show_best(search_res, metric = "roc_auc", n = 1)
#> Error in eval(lhs, parent, parent): object 'search_res' not found
best_param
#> Error in eval(expr, envir, enclos): object 'best_param' not found

svm_wflow <- 
  svm_wflow %>% 
  finalize_workflow(best_param)
#> Error in check_final_param(parameters): object 'best_param' not found

svm_wflow
#> == Workflow ====================================================================
#> Preprocessor: Recipe
#> Model: svm_rbf()
#> 
#> -- Preprocessor ----------------------------------------------------------------
#> 4 Recipe Steps
#> 
#> * step_YeoJohnson()
#> * step_normalize()
#> * step_pca()
#> * step_downsample()
#> 
#> -- Model -----------------------------------------------------------------------
#> Radial Basis Function Support Vector Machine Specification (classification)
#> 
#> Main Arguments:
#>   cost = tune()
#>   rbf_sigma = tune()
#> 
#> Computational engine: kernlab

Created on 2020-04-29 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.0 (2020-04-24) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_Australia.1252 #> ctype English_Australia.1252 #> tz Australia/Sydney #> date 2020-04-29 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.0.0) #> backports 1.1.6 2020-04-05 [2] CRAN (R 4.0.0) #> base64enc 0.1-3 2015-07-28 [2] CRAN (R 4.0.0) #> bayesplot 1.7.1 2019-12-01 [2] CRAN (R 4.0.0) #> boot 1.3-25 2020-04-26 [2] CRAN (R 4.0.0) #> broom * 0.5.6 2020-04-20 [2] CRAN (R 4.0.0) #> callr 3.4.3 2020-03-28 [2] CRAN (R 4.0.0) #> class 7.3-16 2020-03-25 [2] CRAN (R 4.0.0) #> cli 2.0.2 2020-02-28 [2] CRAN (R 4.0.0) #> codetools 0.2-16 2018-12-24 [2] CRAN (R 4.0.0) #> colorspace 1.4-1 2019-03-18 [2] CRAN (R 4.0.0) #> colourpicker 1.0 2017-09-27 [2] CRAN (R 4.0.0) #> crayon 1.3.4 2017-09-16 [2] CRAN (R 4.0.0) #> crosstalk 1.1.0.1 2020-03-13 [2] CRAN (R 4.0.0) #> desc 1.2.0 2018-05-01 [2] CRAN (R 4.0.0) #> devtools 2.3.0 2020-04-10 [1] CRAN (R 4.0.0) #> dials * 0.0.6 2020-04-03 [2] CRAN (R 4.0.0) #> DiceDesign 1.8-1 2019-07-31 [2] CRAN (R 4.0.0) #> digest 0.6.25 2020-02-23 [2] CRAN (R 4.0.0) #> dplyr * 0.8.5 2020-03-07 [2] CRAN (R 4.0.0) #> DT 0.13 2020-03-23 [2] CRAN (R 4.0.0) #> dygraphs 1.1.1.6 2018-07-11 [2] CRAN (R 4.0.0) #> ellipsis 0.3.0 2019-09-20 [2] CRAN (R 4.0.0) #> evaluate 0.14 2019-05-28 [2] CRAN (R 4.0.0) #> fansi 0.4.1 2020-01-08 [2] CRAN (R 4.0.0) #> fastmap 1.0.1 2019-10-08 [2] CRAN (R 4.0.0) #> foreach 1.5.0 2020-03-30 [2] CRAN (R 4.0.0) #> fs 1.4.1 2020-04-04 [2] CRAN (R 4.0.0) #> furrr 0.1.0 2018-05-16 [2] CRAN (R 4.0.0) #> future 1.17.0 2020-04-18 [2] CRAN (R 4.0.0) #> generics 0.0.2 2018-11-29 [2] CRAN (R 4.0.0) #> ggplot2 * 3.3.0 2020-03-05 [2] CRAN (R 4.0.0) #> ggridges 0.5.2 2020-01-12 [2] CRAN (R 4.0.0) #> globals 0.12.5 2019-12-07 [2] CRAN (R 4.0.0) #> glue 1.4.0 2020-04-03 [2] CRAN (R 4.0.0) #> gower 0.2.1 2019-05-14 [2] CRAN (R 4.0.0) #> GPfit 1.0-8 2019-02-08 [2] CRAN (R 4.0.0) #> gridExtra 2.3 2017-09-09 [2] CRAN (R 4.0.0) #> gtable 0.3.0 2019-03-25 [2] CRAN (R 4.0.0) #> gtools 3.8.2 2020-03-31 [2] CRAN (R 4.0.0) #> hardhat 0.1.2 2020-02-28 [2] CRAN (R 4.0.0) #> highr 0.8 2019-03-20 [2] CRAN (R 4.0.0) #> htmltools 0.4.0 2019-10-04 [2] CRAN (R 4.0.0) #> htmlwidgets 1.5.1 2019-10-08 [2] CRAN (R 4.0.0) #> httpuv 1.5.2 2019-09-11 [2] CRAN (R 4.0.0) #> igraph 1.2.5 2020-03-19 [2] CRAN (R 4.0.0) #> infer * 0.5.1 2019-11-19 [2] CRAN (R 4.0.0) #> inline 0.3.15 2018-05-18 [2] CRAN (R 4.0.0) #> ipred 0.9-9 2019-04-28 [2] CRAN (R 4.0.0) #> iterators 1.0.12 2019-07-26 [2] CRAN (R 4.0.0) #> janeaustenr 0.1.5 2017-06-10 [2] CRAN (R 4.0.0) #> kernlab 0.9-29 2019-11-12 [2] CRAN (R 4.0.0) #> knitr 1.28 2020-02-06 [2] CRAN (R 4.0.0) #> later 1.0.0 2019-10-04 [2] CRAN (R 4.0.0) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.0) #> lava 1.6.7 2020-03-05 [2] CRAN (R 4.0.0) #> lhs 1.0.2 2020-04-13 [2] CRAN (R 4.0.0) #> lifecycle 0.2.0 2020-03-06 [2] CRAN (R 4.0.0) #> listenv 0.8.0 2019-12-05 [2] CRAN (R 4.0.0) #> lme4 1.1-23 2020-04-07 [2] CRAN (R 4.0.0) #> loo 2.2.0 2019-12-19 [2] CRAN (R 4.0.0) #> lubridate 1.7.8 2020-04-06 [2] CRAN (R 4.0.0) #> magrittr 1.5 2014-11-22 [2] CRAN (R 4.0.0) #> markdown 1.1 2019-08-07 [2] CRAN (R 4.0.0) #> MASS 7.3-51.5 2019-12-20 [2] CRAN (R 4.0.0) #> Matrix 1.2-18 2019-11-27 [2] CRAN (R 4.0.0) #> matrixStats 0.56.0 2020-03-13 [2] CRAN (R 4.0.0) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0) #> mime 0.9 2020-02-04 [2] CRAN (R 4.0.0) #> miniUI 0.1.1.1 2018-05-18 [2] CRAN (R 4.0.0) #> minqa 1.2.4 2014-10-09 [2] CRAN (R 4.0.0) #> modeldata * 0.0.1 2019-12-06 [1] CRAN (R 4.0.0) #> munsell 0.5.0 2018-06-12 [2] CRAN (R 4.0.0) #> nlme 3.1-147 2020-04-13 [2] CRAN (R 4.0.0) #> nloptr 1.2.2.1 2020-03-11 [2] CRAN (R 4.0.0) #> nnet 7.3-13 2020-02-25 [2] CRAN (R 4.0.0) #> parsnip * 0.1.0 2020-04-09 [2] CRAN (R 4.0.0) #> pillar 1.4.3 2019-12-20 [2] CRAN (R 4.0.0) #> pkgbuild 1.0.7 2020-04-25 [2] CRAN (R 4.0.0) #> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.0) #> pkgload 1.0.2 2018-10-29 [2] CRAN (R 4.0.0) #> plyr 1.8.6 2020-03-03 [2] CRAN (R 4.0.0) #> prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.0.0) #> pROC 1.16.2 2020-03-19 [2] CRAN (R 4.0.0) #> processx 3.4.2 2020-02-09 [2] CRAN (R 4.0.0) #> prodlim 2019.11.13 2019-11-17 [2] CRAN (R 4.0.0) #> promises 1.1.0 2019-10-04 [2] CRAN (R 4.0.0) #> ps 1.3.2 2020-02-13 [2] CRAN (R 4.0.0) #> purrr * 0.3.4 2020-04-17 [2] CRAN (R 4.0.0) #> R6 2.4.1 2019-11-12 [2] CRAN (R 4.0.0) #> Rcpp 1.0.4.6 2020-04-09 [2] CRAN (R 4.0.0) #> recipes * 0.1.10 2020-03-18 [2] CRAN (R 4.0.0) #> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0) #> reshape2 1.4.4 2020-04-09 [2] CRAN (R 4.0.0) #> rlang 0.4.5 2020-03-01 [2] CRAN (R 4.0.0) #> rmarkdown 2.1 2020-01-20 [2] CRAN (R 4.0.0) #> rpart 4.1-15 2019-04-12 [2] CRAN (R 4.0.0) #> rprojroot 1.3-2 2018-01-03 [2] CRAN (R 4.0.0) #> rsample * 0.0.6 2020-03-31 [2] CRAN (R 4.0.0) #> rsconnect 0.8.16 2019-12-13 [2] CRAN (R 4.0.0) #> rstan 2.19.3 2020-02-11 [2] CRAN (R 4.0.0) #> rstanarm 2.19.3 2020-02-11 [2] CRAN (R 4.0.0) #> rstantools 2.0.0 2019-09-15 [2] CRAN (R 4.0.0) #> rstudioapi 0.11 2020-02-07 [2] CRAN (R 4.0.0) #> scales * 1.1.0 2019-11-18 [2] CRAN (R 4.0.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> shiny 1.4.0.2 2020-03-13 [2] CRAN (R 4.0.0) #> shinyjs 1.1 2020-01-13 [2] CRAN (R 4.0.0) #> shinystan 2.5.0 2018-05-01 [2] CRAN (R 4.0.0) #> shinythemes 1.1.2 2018-11-06 [2] CRAN (R 4.0.0) #> SnowballC 0.7.0 2020-04-01 [2] CRAN (R 4.0.0) #> StanHeaders 2.19.2 2020-02-11 [2] CRAN (R 4.0.0) #> statmod 1.4.34 2020-02-17 [2] CRAN (R 4.0.0) #> stringi 1.4.6 2020-02-17 [2] CRAN (R 4.0.0) #> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.0.0) #> survival 3.1-12 2020-04-10 [2] CRAN (R 4.0.0) #> testthat 2.3.2 2020-03-02 [2] CRAN (R 4.0.0) #> threejs 0.3.3 2020-01-21 [2] CRAN (R 4.0.0) #> tibble * 3.0.0 2020-03-30 [2] CRAN (R 4.0.0) #> tidymodels * 0.1.0 2020-02-16 [2] CRAN (R 4.0.0) #> tidyposterior 0.0.2 2018-11-15 [2] CRAN (R 4.0.0) #> tidypredict 0.4.5 2020-02-10 [2] CRAN (R 4.0.0) #> tidyr 1.0.2 2020-01-24 [2] CRAN (R 4.0.0) #> tidyselect 1.0.0 2020-01-27 [2] CRAN (R 4.0.0) #> tidytext 0.2.4 2020-04-17 [2] CRAN (R 4.0.0) #> timeDate 3043.102 2018-02-21 [2] CRAN (R 4.0.0) #> tokenizers 0.2.1 2018-03-29 [2] CRAN (R 4.0.0) #> tune * 0.1.0 2020-04-02 [2] CRAN (R 4.0.0) #> usethis 1.6.0 2020-04-09 [2] CRAN (R 4.0.0) #> vctrs 0.2.4 2020-03-10 [2] CRAN (R 4.0.0) #> withr 2.2.0 2020-04-20 [2] CRAN (R 4.0.0) #> workflows * 0.1.1 2020-03-17 [2] CRAN (R 4.0.0) #> xfun 0.13 2020-04-13 [2] CRAN (R 4.0.0) #> xtable 1.8-4 2019-04-21 [2] CRAN (R 4.0.0) #> xts 0.12-0 2020-01-19 [2] CRAN (R 4.0.0) #> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.0.0) #> yardstick * 0.0.6 2020-03-17 [2] CRAN (R 4.0.0) #> zoo 1.8-7 2020-01-10 [2] CRAN (R 4.0.0) #> #> [1] C:/Users/Jacob/Documents/R/win-library/4.0 #> [2] C:/Program Files/R/R-4.0.0/library ```
topepo commented 4 years ago

I'll create an install for 4.0 since I can't reproduce this on 3.6. The issue is that the parameters function doesn't see that you are tuning num_comp. On 3.6 it does:

> svm_wflow <-
+   workflow() %>%
+   add_model(svm_mod) %>%
+   add_recipe(cell_pre_proc)
> 
> parameters(svm_wflow)
Collection of 3 parameters for tuning

        id parameter type object class
      cost           cost    nparam[+]
 rbf_sigma      rbf_sigma    nparam[+]
  num_comp       num_comp    nparam[+]
topepo commented 4 years ago

Verified on 4.0. I'll take a look today.

topepo commented 4 years ago

tunable method is not picking up the right S3 method:

> methods("tunable")
 [1] tunable.boost_tree*       tunable.linear_reg*       tunable.logistic_reg*    
 [4] tunable.model_spec*       tunable.multinomial_reg*  tunable.nearest_neighbor*
 [7] tunable.recipe*           tunable.step              tunable.step_bagimpute   
[10] tunable.step_bs           tunable.step_corr         tunable.step_discretize  
[13] tunable.step_downsample   tunable.step_embed*       tunable.step_ica         
[16] tunable.step_isomap       tunable.step_knnimpute    tunable.step_kpca_poly   
[19] tunable.step_kpca_rbf     tunable.step_meanimpute   tunable.step_nnmf        
[22] tunable.step_ns           tunable.step_nzv          tunable.step_other       
[25] tunable.step_pca          tunable.step_pls          tunable.step_poly        
[28] tunable.step_rollimpute   tunable.step_texthash*    tunable.step_tf*         
[31] tunable.step_tokenfilter* tunable.step_tokenize*    tunable.step_umap*       
[34] tunable.step_upsample     tunable.step_window       tunable.step_woe*        
[37] tunable.workflow*        
see '?methods' for accessing help and source code
> tunable(cell_pre_proc$steps[[3]])
# A tibble: 0 x 5
# … with 5 variables: name <chr>, call_info <list>, source <chr>, component <chr>,
#   component_id <chr>
> tunable.step_pca(cell_pre_proc$steps[[3]])
# A tibble: 1 x 5
  name     call_info        source component component_id
  <chr>    <list>           <chr>  <chr>     <chr>       
1 num_comp <named list [3]> recipe step_pca  pca_245AH   
topepo commented 4 years ago

It turns out that using requireNamespace() to determine if a package is installed does not give the right value when used inside of .onload() (but works otherwise).

This was preventing the S3 tunable methods from being registered for step functions.

JHucker commented 4 years ago

Reverted to R 3.6.3 and it works fine. I'll stick with this version, thanks for looking into that.

github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.