tidymodels / tune

Tools for tidy parameter tuning
https://tune.tidymodels.org
Other
275 stars 42 forks source link

I need a solution. error:'Some tuning parameters require finalization but there are recipe parameters that require tuning' #387

Closed amazongodman closed 11 months ago

amazongodman commented 3 years ago

I am using the sample code written in the top page markdown. When I try to tune ranger, I get the following error.


regularized_spec <- 
  linear_reg(penalty = tune(), mixture = tune()) %>% 
  set_engine("glmnet")

cart_spec <- 
  decision_tree(cost_complexity = tune(), min_n = tune()) %>% 
  set_engine("rpart") %>% 
  set_mode("regression")

rf_spec = rand_forest(mtry = tune(), trees = 50, min_n = tune()) %>% 
  set_engine("ranger") %>% 
  set_mode("regression")

chi_models <- 
  workflow_set(
    preproc = list(simple = base_recipe, 
                   filter = filter_rec, 
                   pca = pca_rec),
    models = list(glmnet = regularized_spec, 
                  cart = cart_spec, 
                  rf = rf_spec),
    cross = TRUE
  )

The error code is as follows

i 1 of 7 tuning:     simple_glmnet
√ 1 of 7 tuning:     simple_glmnet (15.4s)
i 2 of 7 tuning:     simple_cart
√ 2 of 7 tuning:     simple_cart (17.4s)
i 3 of 7 tuning:     simple_rf
i Creating pre-processing data to finalize unknown parameter: mtry
√ 3 of 7 tuning:     simple_rf (4m 11.9s)
i 4 of 7 tuning:     filter_cart
√ 4 of 7 tuning:     filter_cart (27.3s)
i 5 of 7 tuning:     filter_rf
x 5 of 7 tuning:     filter_rf failed with: Some tuning parameters require finalization but there are recipe parameters that require tuning. Please use `parameters()` to finalize the parameter ranges.
i 6 of 7 tuning:     pca_cart
√ 6 of 7 tuning:     pca_cart (22.7s)
i 7 of 7 tuning:     pca_rf
x 7 of 7 tuning:     pca_rf failed with: Some tuning parameters require finalization but there are recipe parameters that require tuning. Please use `parameters()` to finalize the parameter ranges.
> rand_forest
function (mode = "unknown", mtry = NULL, trees = NULL, 
    min_n = NULL) 
{
    args <- list(mtry = enquo(mtry), trees = enquo(trees), min_n = enquo(min_n))
    new_model_spec("rand_forest", args = args, eng_args = NULL, 
        mode = mode, method = NULL, engine = NULL)
}
hfrick commented 3 years ago

This looks like you need to finalize your filter_rec and pca_rec recipes before you can tune them in a workflow. It's hard to say anything specific because your example isn't reproducible but maybe ?parameters.recipe will already set you on the right course. If not, please provide a minimal reproducible example (a reprex). The reprex package is very helpful for that and has additional advice on how to create a good reprex at https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html

amazongodman commented 3 years ago

how about this ?

library(tidymodels)
library(workflowsets)

data(Chicago)

Chicago <- Chicago %>% slice(1:365)

base_recipe <- 
  recipe(ridership ~ ., data = Chicago) %>% 
  step_date(date) %>% 
  step_holiday(date) %>% 
  update_role(date, new_role = "id") %>% 
  step_dummy(all_nominal()) %>% 
  step_zv(all_predictors()) %>% 
  step_normalize(all_predictors())

filter_rec <- 
  base_recipe %>% 
  step_corr(all_of(stations), threshold = tune())

pca_rec <- 
  base_recipe %>% 
  step_pca(all_of(stations), num_comp = tune()) %>% 
  step_normalize(all_predictors())

regularized_spec <- 
  linear_reg(penalty = tune(), mixture = tune()) %>% 
  set_engine("glmnet")

cart_spec <- 
  decision_tree(cost_complexity = tune(), min_n = tune()) %>% 
  set_engine("rpart") %>% 
  set_mode("regression")

rf_spec = rand_forest(mtry = tune(), trees = 50, min_n = tune()) %>% 
  set_engine("ranger") %>% 
  set_mode("regression")

chi_models <- 
  workflow_set(
    preproc = list(simple = base_recipe, 
                   filter = filter_rec, 
                   pca = pca_rec),
    models = list(glmnet = regularized_spec, 
                  cart = cart_spec, 
                  rf = rf_spec),
    cross = TRUE
  )

chi_models <- 
  chi_models %>% 
  anti_join(tibble(wflow_id = c("pca_glmnet", "filter_glmnet")), 
            by = "wflow_id")

splits <- 
  sliding_period(
    Chicago,
    date,
    "day",
    lookback = 300,   
    assess_stop = 7, 
    step = 7 
  )

set.seed(123)
chi_models <- 
  chi_models %>% 
  workflow_map("tune_grid", resamples = splits, grid = 5, 
               metrics = metric_set(mae), verbose = TRUE)

autoplot(chi_models)
i 1 of 7 tuning:     simple_glmnet
√ 1 of 7 tuning:     simple_glmnet (16.6s)
i 2 of 7 tuning:     simple_cart
√ 2 of 7 tuning:     simple_cart (17.3s)
i 3 of 7 tuning:     simple_rf
i Creating pre-processing data to finalize unknown parameter: mtry
√ 3 of 7 tuning:     simple_rf (18.2s)
i 4 of 7 tuning:     filter_cart
√ 4 of 7 tuning:     filter_cart (29s)
i 5 of 7 tuning:     filter_rf
x 5 of 7 tuning:     filter_rf failed with: Some tuning parameters require finalization but there are recipe parameters that require tuning. Please use `parameters()` to finalize the parameter ranges.
i 6 of 7 tuning:     pca_cart
√ 6 of 7 tuning:     pca_cart (23.9s)
i 7 of 7 tuning:     pca_rf
x 7 of 7 tuning:     pca_rf failed with: Some tuning parameters require finalization but there are recipe parameters that require tuning. Please use `parameters()` to finalize the parameter ranges.
topepo commented 3 years ago

The problem is that mtry is based on the number of predictors columns. tune_grid() tries to figure this out and set the range for mtry.

For simple_rf, it can do this.

For the other cases, it cannot because the recipe has tuning parameters. For example, tune_grid() would need to know the number of PCA components to be able to set mtry. I can't since that is being tuned.

The error message is not great here (and we'll fix that). The solution is to create your own grid for those two workflows and pass them is using option_add(). For example:

# Get the ones that failed
chi_models_fixed <- 
  chi_models %>% 
  filter(wflow_id %in% c("filter_rf", "pca_rf"))

# Make grids by declaring parameter ranges
set.seed(1)
filter_grid <- 
  chi_models %>% 
  pull_workflow("filter_rf") %>% 
  parameters() %>% 
  # Set a range for mtry: 
  update(mtry = mtry(c(1, 20))) %>% 
  grid_latin_hypercube(size = 10)

set.seed(1)
pca_grid <- 
  chi_models %>% 
  pull_workflow("pca_rf") %>% 
  parameters() %>% 
  # Set a range for num_comp and mtry: 
  update(
    num_comp = num_comp(c(1, 10)),
    mtry = mtry(c(1, 20))
  ) %>% 
  grid_latin_hypercube(size = 10)

# Run the modified grids
chi_models_fixed <- 
  chi_models_fixed %>% 
  option_add(grid = filter_grid, id = "filter_rf") %>% 
  option_add(grid = pca_grid, id = "pca_rf") %>%
  workflow_map("tune_grid", resamples = splits, 
               metrics = metric_set(mae), verbose = TRUE)

# put them back together: 
chi_models <- 
  chi_models %>% 
  filter(!(wflow_id %in% c("filter_rf", "pca_rf"))) %>% 
  bind_rows(chi_models_fixed)

ˆ'll transfer this to the tune repo and add more documentation.

github-actions[bot] commented 10 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.