tidymodels / multilevelmod

Parsnip wrappers for mixed-level and hierarchical models
https://multilevelmod.tidymodels.org/
Other
74 stars 3 forks source link

Error when using the model with fit_resamples() #4

Closed meenakshi-kushwaha closed 3 years ago

meenakshi-kushwaha commented 3 years ago

Hello, I am trying to perform k-fold cross validation on an lmer (linear mixed effects) model using this package and tidymodels method. But, i keep getting the following error with the fit_resamples() function.

"Error: The first argument to [fit_resamples()] should be either a model or workflow."

Following is a reprex

library(multilevelmod)
data(sleepstudy, package = "lme4")

mixed_model_spec <- linear_reg() %>% set_engine("lmer")

mixed_model_fit <- 
  mixed_model_spec %>% 
  fit(Reaction ~ Days + (Days | Subject), data = sleepstudy)

set.seed(345)
folds <- vfold_cv(sleepstudy, v = 5)
folds 

library(tune)
set.seed(456)
fit_rs<-fit_resamples(mixed_model_fit, folds)
juliasilge commented 3 years ago

One thing to note is that you need to have the first argument be either a model or a workflow, like the error message say, but actually something is not working right, even when that is done correctly:

library(tidymodels)
library(multilevelmod)
data(sleepstudy, package = "lme4")
set.seed(345)
sleep_folds <- vfold_cv(sleepstudy, group = Subject, v = 3)
sleep_folds
#> #  3-fold cross-validation 
#> # A tibble: 3 x 2
#>   splits           id   
#>   <list>           <chr>
#> 1 <split [120/60]> Fold1
#> 2 <split [120/60]> Fold2
#> 3 <split [120/60]> Fold3

mixed_model_spec <- linear_reg() %>% set_engine("lmer")
mixed_model_wf <- workflow() %>%
  add_model(mixed_model_spec, formula = Reaction ~ Days + (Days | Subject)) %>%
  add_variables(outcomes = Reaction, predictors = c(Days, Subject))

## workflow will fit one time just fine
fit(mixed_model_wf, sleepstudy)
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Variables
#> Model: linear_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> Outcomes: Reaction
#> Predictors: c(Days, Subject)
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ Days + (Days | Subject)
#>    Data: data
#> REML criterion at convergence: 1743.628
#> Random effects:
#>  Groups   Name        Std.Dev. Corr
#>  Subject  (Intercept) 24.741       
#>           Days         5.922   0.07
#>  Residual             25.592       
#> Number of obs: 180, groups:  Subject, 18
#> Fixed Effects:
#> (Intercept)         Days  
#>      251.41        10.47

## workflow will *not* fit to resamples
fit_resamples(mixed_model_wf, sleep_folds)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
#>     flatten_lgl, flatten_raw, invoke, list_along, modify, prepend,
#>     splice
#> 
#> Attaching package: 'vctrs'
#> The following object is masked from 'package:tibble':
#> 
#>     data_frame
#> The following object is masked from 'package:dplyr':
#> 
#>     data_frame
#> Loading required package: Matrix
#> 
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#> 
#>     expand, pack, unpack
#> x Fold1: preprocessor 1/1, model 1/1 (predictions): Error in if (remove_intercept...
#> x Fold2: preprocessor 1/1, model 1/1 (predictions): Error in if (remove_intercept...
#> x Fold3: preprocessor 1/1, model 1/1 (predictions): Error in if (remove_intercept...
#> Warning: All models failed. See the `.notes` column.
#> Error in glue_data(.x = NULL, ..., .sep = .sep, .envir = .envir, .open = .open, : Expecting '}'

Created on 2020-12-04 by the reprex package (v0.3.0.9001)

juliasilge commented 3 years ago

I walked through this today, and the problem happens at prediction time within tune::fit_resamples() (try debugonce(predict_model)) at this point:

Browse[2]> predict(model, x_vals, type = type_iter)
Error in if (remove_intercept & any(grepl("Intercept", names(new_data)))) { : 
  argument is of length zero
Backtrace:
  1. tune::fit_resamples(mixed_model_wf, sleep_folds)
 21. tune:::safely_iterate(...) R/grid_code_paths.R:344:2
 27. tune:::fn(...) R/grid_code_paths.R:414:4
 36. tune::predict_model(split, workflow, iter_grid, metrics, iter_submodels) R/grid_code_paths.R:282:6
 38. parsnip::predict.model_fit(model, x_vals, type = type_iter)
 40. parsnip::predict_numeric.model_fit(...)
 41. parsnip::prepare_data(object, new_data)

And actually, this model seems to have a hard time predicting at all:

library(tidymodels)
library(multilevelmod)
data(sleepstudy, package = "lme4")
set.seed(345)
sleep_folds <- vfold_cv(sleepstudy, group = Subject, v = 3)
sleep_folds
#> #  3-fold cross-validation 
#> # A tibble: 3 x 2
#>   splits           id   
#>   <list>           <chr>
#> 1 <split [120/60]> Fold1
#> 2 <split [120/60]> Fold2
#> 3 <split [120/60]> Fold3

mixed_model_spec <- linear_reg() %>% set_engine("lmer")
mixed_model_wf <- workflow() %>%
  add_model(mixed_model_spec, formula = Reaction ~ Days + (Days | Subject)) %>%
  add_variables(outcomes = Reaction, predictors = c(Days, Subject))

## workflow will fit one time just fine
mixed_fit <- fit(mixed_model_wf, sleepstudy)
mixed_fit
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Variables
#> Model: linear_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> Outcomes: Reaction
#> Predictors: c(Days, Subject)
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ Days + (Days | Subject)
#>    Data: data
#> REML criterion at convergence: 1743.628
#> Random effects:
#>  Groups   Name        Std.Dev. Corr
#>  Subject  (Intercept) 24.741       
#>           Days         5.922   0.07
#>  Residual             25.592       
#> Number of obs: 180, groups:  Subject, 18
#> Fixed Effects:
#> (Intercept)         Days  
#>      251.41        10.47

## will *not* predict
predict(mixed_fit, sleepstudy[2,])
#> Error in if (remove_intercept & any(grepl("Intercept", names(new_data)))) {: argument is of length zero

Created on 2020-12-08 by the reprex package (v0.3.0.9001)

juliasilge commented 3 years ago

With the current development version of multilevelmod, this does work. 🎉

library(tidymodels)
library(multilevelmod)
data(sleepstudy, package = "lme4")
set.seed(345)
sleep_folds <- group_vfold_cv(sleepstudy, group = Subject, v = 3)
sleep_folds
#> # Group 3-fold cross-validation 
#> # A tibble: 3 x 2
#>   splits           id       
#>   <list>           <chr>    
#> 1 <split [120/60]> Resample1
#> 2 <split [120/60]> Resample2
#> 3 <split [120/60]> Resample3

mixed_model_spec <- linear_reg() %>% set_engine("lmer")
mixed_model_wf <- workflow() %>%
  add_model(mixed_model_spec, formula = Reaction ~ Days + (Days | Subject)) %>%
  add_variables(outcomes = Reaction, predictors = c(Days, Subject))

fit(mixed_model_wf, sleepstudy)
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Variables
#> Model: linear_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> Outcomes: Reaction
#> Predictors: c(Days, Subject)
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ Days + (Days | Subject)
#>    Data: data
#> REML criterion at convergence: 1743.628
#> Random effects:
#>  Groups   Name        Std.Dev. Corr
#>  Subject  (Intercept) 24.741       
#>           Days         5.922   0.07
#>  Residual             25.592       
#> Number of obs: 180, groups:  Subject, 18
#> Fixed Effects:
#> (Intercept)         Days  
#>      251.41        10.47
fit_resamples(mixed_model_wf, sleep_folds)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
#>     flatten_lgl, flatten_raw, invoke, list_along, modify, prepend,
#>     splice
#> 
#> Attaching package: 'vctrs'
#> The following object is masked from 'package:tibble':
#> 
#>     data_frame
#> The following object is masked from 'package:dplyr':
#> 
#>     data_frame
#> Loading required package: Matrix
#> 
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#> 
#>     expand, pack, unpack
#> # Resampling results
#> # Group 3-fold cross-validation 
#> # A tibble: 3 x 4
#>   splits           id        .metrics         .notes          
#>   <list>           <chr>     <list>           <list>          
#> 1 <split [120/60]> Resample1 <tibble [2 × 4]> <tibble [0 × 1]>
#> 2 <split [120/60]> Resample2 <tibble [2 × 4]> <tibble [0 × 1]>
#> 3 <split [120/60]> Resample3 <tibble [2 × 4]> <tibble [0 × 1]>

Created on 2020-12-10 by the reprex package (v0.3.0.9001)

You can install this via:

devtools::install_github("tidymodels/multilevelmod")
github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.