Closed TylerGrantSmith closed 2 years ago
You can use extract_mold()
to pull out the mold, like so:
library(tidymodels)
rec_spec <- recipe(Sepal.Length ~ ., data = iris) %>% step_dummy(all_nominal_predictors())
xgb_spec <- boost_tree() %>% set_mode("regression")
xgb_wf <- workflow(rec_spec, xgb_spec)
xgb_fit <- fit(xgb_wf, data = iris)
xgb_fit %>% extract_mold() %>% pluck("predictors")
#> # A tibble: 150 × 5
#> Sepal.Width Petal.Length Petal.Width Species_versicolor Species_virginica
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 3.5 1.4 0.2 0 0
#> 2 3 1.4 0.2 0 0
#> 3 3.2 1.3 0.2 0 0
#> 4 3.1 1.5 0.2 0 0
#> 5 3.6 1.4 0.2 0 0
#> 6 3.9 1.7 0.4 0 0
#> 7 3.4 1.4 0.3 0 0
#> 8 3.4 1.5 0.2 0 0
#> 9 2.9 1.4 0.2 0 0
#> 10 3.1 1.5 0.1 0 0
#> # … with 140 more rows
Created on 2022-07-15 by the reprex package (v2.0.1)
Notice that we have dummy variables for Species
, because these are the transformed predictors.
Thanks @juliasilge , it is a little bit more verbose to apply the process to new data and complete the transformation including the "interface" conversion that prepare_data
provides. It just seems like this should be an exported function that can take a fit object and data and output a fully formed dataset for input into the model. Perhaps I am alone with this issue?
library(tidymodels)
rec_spec <- recipe(Sepal.Length ~ ., data = iris) %>% step_dummy(all_nominal_predictors())
xgb_spec <- boost_tree() %>% set_mode("regression")
xgb_wf <- workflow(rec_spec, xgb_spec)
xgb_fit <- fit(xgb_wf, data = iris)
form_data <- function(object, new_data) {
fit_parsnip <- extract_fit_parsnip(object)
prepare_data(fit_parsnip, forge_predictors(new_data, object))
}
environment(form_data) <- getNamespace("workflows")
xgb_fit %>% form_data(head(iris))
#> Sepal.Width Petal.Length Petal.Width Species_versicolor Species_virginica
#> [1,] 3.5 1.4 0.2 0 0
#> [2,] 3.0 1.4 0.2 0 0
#> [3,] 3.2 1.3 0.2 0 0
#> [4,] 3.1 1.5 0.2 0 0
#> [5,] 3.6 1.4 0.2 0 0
#> [6,] 3.9 1.7 0.4 0 0
I think the function you are really looking for is hardhat::forge()
. You can do it in two lines of code when combined with extract_mold()
.
I don't think we want to make this any easier because I think we want people to treat the preprocessing + model fitting as a single workflow, and this short circuits at the end of the preprocessing part (which isn't a bad thing for debugging, but isn't something we want super visible).
library(tidymodels)
rec_spec <- recipe(Sepal.Length ~ ., data = iris) %>% step_dummy(all_nominal_predictors())
xgb_spec <- boost_tree() %>% set_mode("regression")
xgb_wf <- workflow(rec_spec, xgb_spec)
xgb_fit <- fit(xgb_wf, data = iris)
xgb_mold <- extract_mold(xgb_fit)
hardhat::forge(
new_data = head(iris),
blueprint = xgb_mold$blueprint
)
#> $predictors
#> # A tibble: 6 × 5
#> Sepal.Width Petal.Length Petal.Width Species_versicolor Species_virginica
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 3.5 1.4 0.2 0 0
#> 2 3 1.4 0.2 0 0
#> 3 3.2 1.3 0.2 0 0
#> 4 3.1 1.5 0.2 0 0
#> 5 3.6 1.4 0.2 0 0
#> 6 3.9 1.7 0.4 0 0
#>
#> $outcomes
#> NULL
#>
#> $extras
#> $extras$roles
#> NULL
Created on 2022-07-20 by the reprex package (v2.0.1)
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
There are many times where I might want to inspect or use the transformed data that gets passed directly into one of the model fitting functions (i.e.
xgboost::xgboost
), but there does not seem to be a defined way to get this 'final' transformed dataset from the workflow.What I do currently is basically emulate what is happening in
predict.workflow
, but it seems like this should be encapsulated in its own exported function, because it requires the use offorge_predictors
which is not currently exported.