Closed AlbertoImg closed 4 months ago
Thanks for including an example! I can't run it, though, so this is a bit of a general reply: if your feature selection step changes the names of the predictors, e.g., through transformations, you can't use the original predictor name in the model formula (that's the one that you use in add_model()
) because they will not be there anymore after the preprocessing.
You could use the dot notation in the formula, i.e., something similar to Disease ~ . + (1|ID) - ID
. You'd need to ensure only variables you want to use in the model are left after preprocessing and remove the fixed effect for the ID
variable. Here is an illustration of that idea
library(tidymodels)
library(multilevelmod)
data(sleepstudy, package = "lme4")
# we want to use the formula Reaction ~ Days + (1|Subject)
lmer_spec <-
linear_reg() %>%
set_engine("lmer")
# recipe here without any further preprocessing/feature engineering
# because the data already only contains the 3 variables we are going to use
rec <- recipe(Reaction ~ ., sleepstudy)
wflow <- workflow() %>%
add_recipe(rec) %>%
add_model(lmer_spec, formula = Reaction ~ . -Subject + (1|Subject))
fit(wflow, data = sleepstudy)
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ . - Subject + (1 | Subject)
#> Data: data
#> REML criterion at convergence: 1786.465
#> Random effects:
#> Groups Name Std.Dev.
#> Subject (Intercept) 37.12
#> Residual 30.99
#> Number of obs: 180, groups: Subject, 18
#> Fixed Effects:
#> (Intercept) Days
#> 251.41 10.47
# same fit as with Reaction ~ Days + (1|Subject)
lmer_spec %>%
fit(Reaction ~ Days + (1|Subject), data = sleepstudy)
#> parsnip model object
#>
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ Days + (1 | Subject)
#> Data: data
#> REML criterion at convergence: 1786.465
#> Random effects:
#> Groups Name Std.Dev.
#> Subject (Intercept) 37.12
#> Residual 30.99
#> Number of obs: 180, groups: Subject, 18
#> Fixed Effects:
#> (Intercept) Days
#> 251.41 10.47
Created on 2024-01-24 with reprex v2.0.2
Hi hfrick, Thanks for your answer! I tried it and the fit function is working with my dataset and workflow (as your example). However, then when I use augment or predict I got the following error: fitted_model <- fit( current_workflow, data = data_training ) prediction_fold <- fitted_model %>% augment(new_data_fold)
Error in terms.formula(ff) : '.' in formula and no 'data' argument
I tried this but I get the same: fitted_model %>% augment(data=data_training, new_data=new_data_fold)
Thanks again Best Alberto
I can't help you with this one without a proper reprex. Please check out the reprex package for easily making those and the article on dos and donts. Github issues are best used for bug reports and feature requests; for general help in how to get a piece of code to run, Posit Community is the best place, also because more people see your question there and can chime in.
Thank a lot for your support! I will see if in Posit Community I can get some tips as well. Best Alberto
Hi developers,
I would like to know how to update the formula used in a workflow object during the fitting step, after a feature selection (FS) pre-processing step was performed. The current issues is that when running the fit function I get "Error in eval(predvars, data, env) : object 'Var1' not found". It happens since the FS extracted that predictor, but the formula still considered it. I had to add the formula using add_model, since I am working with a linear mixed effect for classification, and so far I could not find a way to set the random effects ("ID") in a recipe object.
Case example:
Thanks in advance Any help it is really appreciated Best Alberto