Closed amazongodman closed 11 months ago
This looks like you need to finalize your filter_rec
and pca_rec
recipes before you can tune them in a workflow. It's hard to say anything specific because your example isn't reproducible but maybe ?parameters.recipe
will already set you on the right course. If not, please provide a minimal reproducible example (a reprex). The reprex package is very helpful for that and has additional advice on how to create a good reprex at https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html
how about this ?
library(tidymodels)
library(workflowsets)
data(Chicago)
Chicago <- Chicago %>% slice(1:365)
base_recipe <-
recipe(ridership ~ ., data = Chicago) %>%
step_date(date) %>%
step_holiday(date) %>%
update_role(date, new_role = "id") %>%
step_dummy(all_nominal()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
filter_rec <-
base_recipe %>%
step_corr(all_of(stations), threshold = tune())
pca_rec <-
base_recipe %>%
step_pca(all_of(stations), num_comp = tune()) %>%
step_normalize(all_predictors())
regularized_spec <-
linear_reg(penalty = tune(), mixture = tune()) %>%
set_engine("glmnet")
cart_spec <-
decision_tree(cost_complexity = tune(), min_n = tune()) %>%
set_engine("rpart") %>%
set_mode("regression")
rf_spec = rand_forest(mtry = tune(), trees = 50, min_n = tune()) %>%
set_engine("ranger") %>%
set_mode("regression")
chi_models <-
workflow_set(
preproc = list(simple = base_recipe,
filter = filter_rec,
pca = pca_rec),
models = list(glmnet = regularized_spec,
cart = cart_spec,
rf = rf_spec),
cross = TRUE
)
chi_models <-
chi_models %>%
anti_join(tibble(wflow_id = c("pca_glmnet", "filter_glmnet")),
by = "wflow_id")
splits <-
sliding_period(
Chicago,
date,
"day",
lookback = 300,
assess_stop = 7,
step = 7
)
set.seed(123)
chi_models <-
chi_models %>%
workflow_map("tune_grid", resamples = splits, grid = 5,
metrics = metric_set(mae), verbose = TRUE)
autoplot(chi_models)
i 1 of 7 tuning: simple_glmnet
√ 1 of 7 tuning: simple_glmnet (16.6s)
i 2 of 7 tuning: simple_cart
√ 2 of 7 tuning: simple_cart (17.3s)
i 3 of 7 tuning: simple_rf
i Creating pre-processing data to finalize unknown parameter: mtry
√ 3 of 7 tuning: simple_rf (18.2s)
i 4 of 7 tuning: filter_cart
√ 4 of 7 tuning: filter_cart (29s)
i 5 of 7 tuning: filter_rf
x 5 of 7 tuning: filter_rf failed with: Some tuning parameters require finalization but there are recipe parameters that require tuning. Please use `parameters()` to finalize the parameter ranges.
i 6 of 7 tuning: pca_cart
√ 6 of 7 tuning: pca_cart (23.9s)
i 7 of 7 tuning: pca_rf
x 7 of 7 tuning: pca_rf failed with: Some tuning parameters require finalization but there are recipe parameters that require tuning. Please use `parameters()` to finalize the parameter ranges.
The problem is that mtry
is based on the number of predictors columns. tune_grid()
tries to figure this out and set the range for mtry
.
For simple_rf
, it can do this.
For the other cases, it cannot because the recipe has tuning parameters. For example, tune_grid()
would need to know the number of PCA components to be able to set mtry. I can't since that is being tuned.
The error message is not great here (and we'll fix that). The solution is to create your own grid for those two workflows and pass them is using option_add()
. For example:
# Get the ones that failed
chi_models_fixed <-
chi_models %>%
filter(wflow_id %in% c("filter_rf", "pca_rf"))
# Make grids by declaring parameter ranges
set.seed(1)
filter_grid <-
chi_models %>%
pull_workflow("filter_rf") %>%
parameters() %>%
# Set a range for mtry:
update(mtry = mtry(c(1, 20))) %>%
grid_latin_hypercube(size = 10)
set.seed(1)
pca_grid <-
chi_models %>%
pull_workflow("pca_rf") %>%
parameters() %>%
# Set a range for num_comp and mtry:
update(
num_comp = num_comp(c(1, 10)),
mtry = mtry(c(1, 20))
) %>%
grid_latin_hypercube(size = 10)
# Run the modified grids
chi_models_fixed <-
chi_models_fixed %>%
option_add(grid = filter_grid, id = "filter_rf") %>%
option_add(grid = pca_grid, id = "pca_rf") %>%
workflow_map("tune_grid", resamples = splits,
metrics = metric_set(mae), verbose = TRUE)
# put them back together:
chi_models <-
chi_models %>%
filter(!(wflow_id %in% c("filter_rf", "pca_rf"))) %>%
bind_rows(chi_models_fixed)
ˆ'll transfer this to the tune repo and add more documentation.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
I am using the sample code written in the top page markdown. When I try to tune ranger, I get the following error.
The error code is as follows