tidymodels / stacks

An R package for tidy stacked ensemble modeling
https://stacks.tidymodels.org
Other
294 stars 27 forks source link

I tried to stack 3 models. The first 2 can stack, but adding the last, a cubist, gave me this error: Error in { : task 1 failed - "data.frame_ is unknown.". #106

Closed amcmahon17 closed 2 years ago

amcmahon17 commented 2 years ago

I tried to stack 3 models. The first 2 can stack, but adding the last, a cubist, gave me this error: Error in { : task 1 failed - "data.frame_ is unknown.". Also: Warning message: The ... are not used in this function but one or more objects were passed: 'parallel_over'

library(chemometrics)
library(tidymodels)
library(stacks)

library(doParallel)

ctrl_grid <- control_stack_grid()

cl <- makeCluster(detectCores())
registerDoParallel(cl)

data("NIR")
specsampledata<-bind_cols(NIR$yGlcEtOH,NIR$xNIR)
regmetrics<-metric_set(yardstick::rmse, yardstick::rsq, yardstick::mae)
set.seed(565)

specsampledata_split <- initial_split(specsampledata, prop = .5,strata=Ethanol)
specsampledata_train_data <- training(specsampledata_split)
specsampledata_test_data  <- testing(specsampledata_split)

train_data_cv <- vfold_cv(specsampledata_train_data ,repeats=3,strata=Ethanol)

xgb_spec <- boost_tree(
  trees = tune(), 
  tree_depth = tune(), min_n = tune(), 
  loss_reduction = tune(),                     
  sample_size = tune(), mtry = tune(),         
  learn_rate = tune(),                        
) %>% 
  set_engine("xgboost") %>% 
  set_mode("regression")

xgb_rec <- 
  recipe(Ethanol ~ ., data = specsampledata_train_data ) %>% 
  step_corr(all_numeric_predictors())

xgb_workflow <- 
  workflow() %>% 
  add_model(xgb_spec) %>% 
  add_recipe(xgb_rec)

xgb_grid <- grid_latin_hypercube(
  tree_depth(),
  min_n(),
  trees(),
  loss_reduction(),
  sample_size = sample_prop(),
  finalize(mtry(), specsampledata_train_data ),
  learn_rate(),
  size = 30
)

xgb_grid_results <- xgb_workflow %>%
  tune_grid(resamples = train_data_cv, 
            grid=xgb_grid,
            metrics = regmetrics, 
            control = ctrl_grid,
            parallel_over = "resamples"
  )

bagmars_spec <- bag_mars(
  num_terms = tune(),
  prod_degree = tune()                      ## step size
) %>% 
  set_mode("regression")

library(baguette)

bagmars_grid <- grid_latin_hypercube(
  finalize(num_terms(), specsampledata_train_data ),
  prod_degree(),
  size = 30
)

bagmars_rec <- 
  recipe(Ethanol ~ ., data = specsampledata_train_data ) 

bagmars_workflow <- 
  workflow() %>% add_model(bagmars_spec) %>% add_recipe(bagmars_rec)

bagmars_tune_grid_results <-bagmars_workflow  %>%
  tune_grid(resamples = train_data_cv, 
            grid=bagmars_grid,
            metrics = regmetrics, 
            control = ctrl_grid,
            parallel_over = "resamples"
  )

bagmars_tune_grid_results %>% collect_metrics()

library(rules)

cubist_spec<-
  cubist_rules(
    committees = tune(),
    neighbors = tune(),
    max_rules = tune()
  )

cubist_rec <- 
  recipe(Ethanol ~ ., data = specsampledata_train_data )

cubist_workflow <- 
  workflow() %>% 
  add_model(cubist_spec) %>% 
  add_recipe(cubist_rec)

cubist_grid <- grid_latin_hypercube(
  max_rules(),
  committees(),
  neighbors(),
  size = 30
)

cubist_workflow <- 
  workflow() %>% add_model(cubist_spec) %>% add_recipe(cubist_rec)

cubist_tune_grid_results <-cubist_workflow  %>%
  tune_grid(resamples = train_data_cv, 
            grid=cubist_grid,
            metrics = regmetrics, 
            control = ctrl_grid,
            parallel_over = "resamples"
  )

library(stacks)
stackedmodel <- 
  stacks() %>%
  add_candidates(bagmars_tune_grid_results) %>%
  add_candidates(cubist_tune_grid_results) %>%
  add_candidates(xgb_grid_results) %>%
  blend_predictions(penalty = c(.5, 1),metric = metric_set(rmse))  %>% fit_members()

bind_cols(specsampledata_test_data,stackedmodel %>% predict(specsampledata_test_data)) %>% 
  select(Ethanol,.pred) %>% regmetrics(truth=Ethanol,estimate=.pred)

stopImplicitCluster(cl)
amcmahon17 commented 2 years ago

Quick update: I tried this again and it worked. I think stacks is blameless. It's the parallel steps that seem to generate the errors, and only sometimes. Perhaps my machine is the problem.

simonpcouch commented 2 years ago

Thanks for the issue! :)

I will indeed close this issue, but I do think {stacks} ought to fail more gracefully here. Related to #105 in that {stacks} fails to point out that a candidate failed to train / had some sort of issue pre-stacking and supplies some other, uninformative error. Will work a change into the next release that gives an eye to erroring informatively in this situation.

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.