Closed SlowMo24 closed 3 years ago
In this example, you could create the function you pass to step_mutate_at()
instead of defining it in place to avoid this issue. The reason for your error right here is that you technically didn't pass search_str
to the step, but used it in an anonymous function that then goes on to find it in the environment. Creating the functions fully and passing them in, will be more beneficial for you.
library(dplyr)
library(recipes)
input<-data.frame(x=c("foo","bar"),y=c(1,2))
foo_fun <- function(x) {
grepl("foo",x,fixed=TRUE)
}
model_recipe <- recipe(y~.,data=input)%>%
step_mutate_at(x,fn=
list(
bool=foo_fun
)
)
trained_recipe<-model_recipe%>%
prep()
trained_recipe%>%
bake(new_data=NULL)
#> # A tibble: 2 x 3
#> x y x_bool
#> <fct> <dbl> <lgl>
#> 1 foo 1 TRUE
#> 2 bar 2 FALSE
input_two<-data.frame(x=c("foo","bar"),y=c(1,2))
trained_recipe%>%
bake(new_data=input_two)
#> # A tibble: 2 x 3
#> x y x_bool
#> <fct> <dbl> <lgl>
#> 1 foo 1 TRUE
#> 2 bar 2 FALSE
Created on 2021-07-16 by the reprex package (v2.0.0)
You can use quasiquotation to embed variables into a prepped recipe for many steps, such as step_mutate()
:
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
input <- data.frame(x=c("foo","bar"),y=c(1,2))
search_str <- "foo"
model_recipe <- recipe(y ~ ., data = input) %>%
step_mutate(bool = grepl(!!search_str, x, fixed=TRUE))
recipe_prep <- prep(model_recipe)
recipe_prep %>% bake(new_data = NULL)
#> # A tibble: 2 x 3
#> x y bool
#> <fct> <dbl> <lgl>
#> 1 foo 1 TRUE
#> 2 bar 2 FALSE
remove(search_str)
input_two <- data.frame(x=c("foo","bar"),y=c(1,2))
recipe_prep %>% bake(new_data = input_two)
#> # A tibble: 2 x 3
#> x y bool
#> <fct> <dbl> <lgl>
#> 1 foo 1 TRUE
#> 2 bar 2 FALSE
Created on 2021-07-16 by the reprex package (v2.0.0)
This may be a way for you to meet your analysis needs as well.
great, thank you for your answers! I don't see any need to further act on this issue.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.
In situations when a local variable is used in a recipe (e.g. for cleaner code) that is not available during data processing,
bake()
will fail:I see no advantage in this flexibility as
recipies
is about consistency. Using local variables that may have different format or content duringprep
andbake
may lead to undesired side-effects. Wouldn't it be better to 'codify' the variables' content in the trained recipe for maximum transferability?It may even be a new feature to offer variable injection during bake if desired.