tidymodels / finetune

Additional functions for model tuning
https://finetune.tidymodels.org/
Other
62 stars 8 forks source link

Errors with `mtry` parameter. #39

Closed nipnipj closed 7 months ago

nipnipj commented 2 years ago

When trying to use tune_race_anova, tune_race_win_loss, or tune_sim_anneal functions, The following error messages pop up:

>at <- learner_xgboost %>% # finalize mtry doesn't work here.
+   tune_sim_anneal(
+     resamples = cv_folds,
+     iter = 2,
+     metrics = metric_set(roc_auc),
+     control = control_sim_anneal(verbose = F)
+   )
Error in `dials::grid_latin_hypercube()`:
! These arguments contains unknowns: `mtry`. See the `finalize()` function.
Run `rlang::last_error()` to see where the error occurred.
> at <- learner_xgboost %>% # finalize mtry doesn't work here.
+   tune_race_anova(
+     resamples = cv_folds,
+     grid = 2,
+     metrics = metric_set(roc_auc),
+     control = control_race(verbose = F)
+   )
i Creating pre-processing data to finalize unknown parameter: mtry
Error in `vec_slice()`:
! Column `splits` (size 1) must match the data frame (size 3).
ℹ In file slice.c at line 188.
ℹ Install the winch package to get additional debugging info the next time you
  get this error.
ℹ This is an internal error in the rlang package, please report it to the package
  authors.
Backtrace:
     ▆
  1. ├─learner_xgboost %>% ...
  2. ├─finetune::tune_race_anova(...)
  3. ├─finetune:::tune_race_anova.workflow(...)
  4. │ └─finetune:::tune_race_anova_workflow(...)
  5. │   └─object %>% ...
  6. ├─tune::tune_grid(...)
  7. ├─tune:::tune_grid.workflow(...)
  8. │ └─tune:::tune_grid_workflow(...)
  9. │   └─tune:::tune_grid_loop(...)
 10. │     └─tune:::pull_metrics(resamples, results, control)
 11. │       └─tune:::pulley(resamples, res, ".metrics")
 12. │         ├─dplyr::arrange(resamples, !!!syms(id_cols))
 13. │         └─dplyr:::arrange.data.frame(resamples, !!!syms(id_cols))
 14. │           ├─dplyr::dplyr_row_slice(.data, loc)
 15. │           └─dplyr:::dplyr_row_slice.data.frame(.data, loc)
 16. │             ├─dplyr::dplyr_reconstruct(vec_slice(data, i), data)
 17. │             │ └─dplyr:::dplyr_new_data_frame(data)
 18. │             │   ├─row.names %||% .row_names_info(x, type = 0L)
 19. │             │   └─base::.row_names_info(x, type = 0L)
 20. │             └─vctrs::vec_slice(data, i)
 21. └─rlang:::stop_internal_c_lib(...)
 22.   └─rlang::abort(message, call = call, .internal = TRUE)
 > at <- learner_xgboost %>% # finalize mtry doesn't work here.
+   tune_race_win_loss(
+     resamples = cv_folds,
+     grid = 2,
+     metrics = metric_set(roc_auc),
+     control = control_race(verbose = F)
+   )
i Creating pre-processing data to finalize unknown parameter: mtry
Error in `mutate()`:
! Problem while computing `col = purrr::map(splits, ~NULL)`.
✖ `col` must be size 1, not 0.
Run `rlang::last_error()` to see where the error occurred.

Everything work fine when using tune_grid(), tho.

juliasilge commented 2 years ago

I can't quite reproduce this (with the current CRAN version of finetune):

library(tidymodels)
library(finetune)

data(cells, package = "modeldata")
cells <- cells %>% select(-case)

set.seed(123)
folds <- bootstraps(cells, times = 5)

xgb_spec <-
    boost_tree(mtry = tune(), trees = 1000) %>%
    set_engine("xgboost") %>%
    set_mode("classification")

xgb_wf <- workflow(class ~ ., xgb_spec)

tune_race_anova(xgb_wf, resamples = folds, grid = 3)
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> # Tuning results
#> # Bootstrap sampling 
#> # A tibble: 5 × 5
#>   splits             id         .order .metrics         .notes          
#>   <list>             <chr>       <int> <list>           <list>          
#> 1 <split [2019/747]> Bootstrap2      3 <tibble [6 × 5]> <tibble [0 × 3]>
#> 2 <split [2019/733]> Bootstrap4      1 <tibble [6 × 5]> <tibble [0 × 3]>
#> 3 <split [2019/745]> Bootstrap5      2 <tibble [6 × 5]> <tibble [0 × 3]>
#> 4 <split [2019/737]> Bootstrap1      4 <tibble [6 × 5]> <tibble [0 × 3]>
#> 5 <split [2019/743]> Bootstrap3      5 <tibble [6 × 5]> <tibble [0 × 3]>

Created on 2022-05-18 by the reprex package (v2.0.1)

Can you create a reprex (a minimal reproducible example) for this? The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it.

If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. You may already have reprex installed (it comes with the tidyverse package), but if not you can install it with:

install.packages("reprex")

Thanks! 🙌

nipnipj commented 2 years ago

It works when using bootstraps instead of vfold_cv, at least for me.

EDIT:

juliasilge commented 2 years ago

Can you create a reprex (a minimal reproducible example) for how you generated this error? I was unable to follow what you had here to create the error. Like I said before, the goal of a reprex is to make it easier (or even possible) for us to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page or this documentation from the reprex package.

nipnipj commented 2 years ago
library(tidymodels)
library(finetune)

data(cells, package = "modeldata")
cells <- cells %>% select(-case)

set.seed(123)
folds <- bootstraps(cells, times = 5)

xgb_spec <-
  boost_tree(mtry = tune(), trees = 500) %>%
  set_engine("xgboost") %>%
  set_mode("classification")

xgb_wf <- workflow(class ~ ., xgb_spec)

tune_sim_anneal(xgb_wf, resamples = folds, iter = 3)
#> Error in `dials::grid_latin_hypercube()`:
#> ! These arguments contains unknowns: `mtry`. See the `finalize()` function.
library(tidymodels)
library(finetune)

data(cells, package = "modeldata")
cells <- cells %>% select(-case)

set.seed(123)
folds <- bootstraps(cells, times = 5)

xgb_spec <-
  boost_tree(mtry = finalize(mtry(), cells), trees = tune()) %>%
  set_engine("xgboost") %>%
  set_mode("classification")

xgb_wf <- workflow(class ~ ., xgb_spec)

tune_sim_anneal(xgb_wf, resamples = folds, iter = 3)
#> 
#> ❯  Generating a set of 1 initial parameter results
#> x Bootstrap1: preprocessor 1/1, model 1/1: Error in maybe_proportion(x, nm): 'list' ob...
#> x Bootstrap2: preprocessor 1/1, model 1/1: Error in maybe_proportion(x, nm): 'list' ob...
#> x Bootstrap3: preprocessor 1/1, model 1/1: Error in maybe_proportion(x, nm): 'list' ob...
#> x Bootstrap4: preprocessor 1/1, model 1/1: Error in maybe_proportion(x, nm): 'list' ob...
#> x Bootstrap5: preprocessor 1/1, model 1/1: Error in maybe_proportion(x, nm): 'list' ob...
#> Warning: All models failed. See the `.notes` column.
#> ✓ Initialization complete
#> 
#> Error in UseMethod("mutate"): no applicable method for 'mutate' applied to an object of class "NULL"

Created on 2022-05-18 by the reprex package (v2.0.1)

juliasilge commented 2 years ago

Thank you so much @nipnipj! 🙌

This looks like a bug to me (I can reproduce this), related to #19 and #30. Maybe since this has been a problem a couple of times, we can add some new tests specifically for a parameter that needs finalization, either here or in extratests.

juliasilge commented 2 years ago

In the meantime, a workaround is to use a result from tune_grid() for initial, since tune_grid() can successfully finalize the parameters:

library(tidymodels)
library(finetune)

data(cells, package = "modeldata")
cells <- cells %>% select(-case)

set.seed(123)
folds <- bootstraps(cells, times = 5)

xgb_spec <-
  boost_tree(mtry = tune(), trees = 500) %>%
  set_engine("xgboost") %>%
  set_mode("classification")

xgb_wf <- workflow(class ~ ., xgb_spec)
xgb_rs <- tune_grid(xgb_wf, resamples = folds, grid = 3)
#> i Creating pre-processing data to finalize unknown parameter: mtry
tune_sim_anneal(xgb_wf, resamples = folds, iter = 3, initial = xgb_rs)
#> Optimizing roc_auc
#> Initial best: 0.89264
#> 1 ◯ accept suboptimal  roc_auc=0.89234   (+/-0.00535)
#> 2 ♥ new best           roc_auc=0.8953    (+/-0.006194)
#> 3 ◯ accept suboptimal  roc_auc=0.89515   (+/-0.005769)
#> # Tuning results
#> # Bootstrap sampling 
#> # A tibble: 20 × 5
#>    splits             id         .metrics         .notes           .iter
#>    <list>             <chr>      <list>           <list>           <int>
#>  1 <split [2019/737]> Bootstrap1 <tibble [6 × 5]> <tibble [0 × 3]>     0
#>  2 <split [2019/747]> Bootstrap2 <tibble [6 × 5]> <tibble [0 × 3]>     0
#>  3 <split [2019/743]> Bootstrap3 <tibble [6 × 5]> <tibble [0 × 3]>     0
#>  4 <split [2019/733]> Bootstrap4 <tibble [6 × 5]> <tibble [0 × 3]>     0
#>  5 <split [2019/745]> Bootstrap5 <tibble [6 × 5]> <tibble [0 × 3]>     0
#>  6 <split [2019/737]> Bootstrap1 <tibble [2 × 5]> <tibble [0 × 3]>     1
#>  7 <split [2019/747]> Bootstrap2 <tibble [2 × 5]> <tibble [0 × 3]>     1
#>  8 <split [2019/743]> Bootstrap3 <tibble [2 × 5]> <tibble [0 × 3]>     1
#>  9 <split [2019/733]> Bootstrap4 <tibble [2 × 5]> <tibble [0 × 3]>     1
#> 10 <split [2019/745]> Bootstrap5 <tibble [2 × 5]> <tibble [0 × 3]>     1
#> 11 <split [2019/737]> Bootstrap1 <tibble [2 × 5]> <tibble [0 × 3]>     2
#> 12 <split [2019/747]> Bootstrap2 <tibble [2 × 5]> <tibble [0 × 3]>     2
#> 13 <split [2019/743]> Bootstrap3 <tibble [2 × 5]> <tibble [0 × 3]>     2
#> 14 <split [2019/733]> Bootstrap4 <tibble [2 × 5]> <tibble [0 × 3]>     2
#> 15 <split [2019/745]> Bootstrap5 <tibble [2 × 5]> <tibble [0 × 3]>     2
#> 16 <split [2019/737]> Bootstrap1 <tibble [2 × 5]> <tibble [0 × 3]>     3
#> 17 <split [2019/747]> Bootstrap2 <tibble [2 × 5]> <tibble [0 × 3]>     3
#> 18 <split [2019/743]> Bootstrap3 <tibble [2 × 5]> <tibble [0 × 3]>     3
#> 19 <split [2019/733]> Bootstrap4 <tibble [2 × 5]> <tibble [0 × 3]>     3
#> 20 <split [2019/745]> Bootstrap5 <tibble [2 × 5]> <tibble [0 × 3]>     3

Created on 2022-05-19 by the reprex package (v2.0.1)

juliasilge commented 2 years ago

I'm going to reopen this so we can track this bug and fix it. 👍

topepo commented 2 years ago

I can fix this pretty easily but it requires a currently unexported function from tune. I'll export that and, after the next tune release, update this issue.