Closed AshesITR closed 2 years ago
This seems like a really good idea for butchering a recipe, like replacing the template
with vctrs::vec_ptype(template)
:
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
data(concrete)
concrete <-
concrete %>%
group_by(across(-compressive_strength)) %>%
summarize(compressive_strength = mean(compressive_strength),
.groups = "drop")
set.seed(1501)
concrete_split <- initial_split(concrete, strata = compressive_strength)
concrete_train <- training(concrete_split)
concrete_test <- testing(concrete_split)
rec <- recipe(compressive_strength ~ ., data = concrete_train) %>%
step_normalize(all_numeric_predictors()) %>%
step_poly(all_predictors()) %>%
step_interact(~ all_predictors():all_predictors())
prepped <- prep(rec)
bake(prepped, new_data = concrete_test)
#> # A tibble: 249 × 137
#> compressive_strength cement_poly_1 cement_poly_2 blast_furnace_slag_poly_1
#> <dbl> <dbl> <dbl> <dbl>
#> 1 4.57 -0.0632 0.0967 0.0354
#> 2 7.68 -0.0632 0.0967 0.0354
#> 3 7.72 -0.0609 0.0887 0.0395
#> 4 20.6 -0.0609 0.0887 0.0395
#> 5 6.28 -0.0581 0.0794 0.0440
#> 6 31.0 -0.0581 0.0794 0.0440
#> 7 10.4 -0.0558 0.0716 0.0487
#> 8 33.3 -0.0524 0.0611 0.0584
#> 9 13.7 -0.0521 0.0600 0.0556
#> 10 7.51 -0.0511 0.0571 0.0571
#> # … with 239 more rows, and 133 more variables:
#> # blast_furnace_slag_poly_2 <dbl>, fly_ash_poly_1 <dbl>,
#> # fly_ash_poly_2 <dbl>, water_poly_1 <dbl>, water_poly_2 <dbl>,
#> # superplasticizer_poly_1 <dbl>, superplasticizer_poly_2 <dbl>,
#> # coarse_aggregate_poly_1 <dbl>, coarse_aggregate_poly_2 <dbl>,
#> # fine_aggregate_poly_1 <dbl>, fine_aggregate_poly_2 <dbl>, age_poly_1 <dbl>,
#> # age_poly_2 <dbl>, cement_poly_1_x_cement_poly_2 <dbl>, …
prepped$template <- prepped$template[integer(), ]
juice(prepped)
#> # A tibble: 0 × 137
#> # … with 137 variables: compressive_strength <dbl>, cement_poly_1 <dbl>,
#> # cement_poly_2 <dbl>, blast_furnace_slag_poly_1 <dbl>,
#> # blast_furnace_slag_poly_2 <dbl>, fly_ash_poly_1 <dbl>,
#> # fly_ash_poly_2 <dbl>, water_poly_1 <dbl>, water_poly_2 <dbl>,
#> # superplasticizer_poly_1 <dbl>, superplasticizer_poly_2 <dbl>,
#> # coarse_aggregate_poly_1 <dbl>, coarse_aggregate_poly_2 <dbl>,
#> # fine_aggregate_poly_1 <dbl>, fine_aggregate_poly_2 <dbl>, …
bake(prepped, new_data = concrete_test)
#> # A tibble: 249 × 137
#> compressive_strength cement_poly_1 cement_poly_2 blast_furnace_slag_poly_1
#> <dbl> <dbl> <dbl> <dbl>
#> 1 4.57 -0.0632 0.0967 0.0354
#> 2 7.68 -0.0632 0.0967 0.0354
#> 3 7.72 -0.0609 0.0887 0.0395
#> 4 20.6 -0.0609 0.0887 0.0395
#> 5 6.28 -0.0581 0.0794 0.0440
#> 6 31.0 -0.0581 0.0794 0.0440
#> 7 10.4 -0.0558 0.0716 0.0487
#> 8 33.3 -0.0524 0.0611 0.0584
#> 9 13.7 -0.0521 0.0600 0.0556
#> 10 7.51 -0.0511 0.0571 0.0571
#> # … with 239 more rows, and 133 more variables:
#> # blast_furnace_slag_poly_2 <dbl>, fly_ash_poly_1 <dbl>,
#> # fly_ash_poly_2 <dbl>, water_poly_1 <dbl>, water_poly_2 <dbl>,
#> # superplasticizer_poly_1 <dbl>, superplasticizer_poly_2 <dbl>,
#> # coarse_aggregate_poly_1 <dbl>, coarse_aggregate_poly_2 <dbl>,
#> # fine_aggregate_poly_1 <dbl>, fine_aggregate_poly_2 <dbl>, age_poly_1 <dbl>,
#> # age_poly_2 <dbl>, cement_poly_1_x_cement_poly_2 <dbl>, …
Created on 2021-11-29 by the reprex package (v2.0.1)
@AshesITR would you be interested in contributing a PR to hardhat to implement this butcher method for a recipe? We have an article here with some advice on contributing to butcher, but like you have probably already discovered, the method would go in this file.
Sure, I'll make this a PR. Regarding df[integer(), ]
vs. vctrs::vec_ptype(df)
: Do you have an opinion regarding any of these alternatives?
Do you have an opinion on that @DavisVaughan? butcher doesn't currently import vctrs but does import tibble, which imports vctrs.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
It should remove
x$template
, which contains the prepped data of the training set.reprex stolen and adapted from tidymodels/recipes#859
The proposed implementation is quite simple, if I'm not missing anything: