Closed Athospd closed 4 years ago
You likely need to use skip = TRUE
in the recipe step that affects the outcome.
See https://github.com/tidymodels/workflows/issues/37
And https://tidymodels.github.io/hardhat/articles/forge.html#a-note-on-recipes
Just a feedback that this became a major drawback in the way. In my opinion, the smoothness of the workflow breaks with this issue. I don't even know how to deal with it yet. I had to make the outcome's steps outside of the recipe.
I would like to use resamples + tune + workflow + recipe + parsnip seamlessly but I seem that I have to choose between tunning or predicting.
Please guide me if I'm doing it all wrong. I'll be glad to contribute also, let me know how.
Sorry for the frustration. It would help to know more.
Does skip
solve the issue?
We can't make the assumption that the outcome data is always available (outside of the original training set). Theoretically, the outcome should only be required during model training.
skip
tries to solve this issue by using the step during training (via juice()
) but not inside of bake()
.
Oh no problem, Max and Davis. I apologize if I sounded rude! I am more enthusiastic than frustrated with your work and tidymodels. OK, to the pain point.
I am been forced to do outcomes transformations outside the recipe.
Why:
When skip = TRUE
-> Tune grid and resamples give me wrong results because the metrics are been calculated with the outcome's raw form on the validation set against the predictions of the transformed version.
When skip = FALSE
-> predict()
return error saying that there is no outcome column even though I don't ask for them.
The workflow below would be convenient cuz one would not need to revisit any part of the code at any step.
# Reprex: Swtich skip = TRUE/FALSE to compare the results
library(tidymodels)
iris_split <- initial_split(iris %>% select(starts_with("Sepal"), starts_with("Petal")))
iris_train <- training(iris_split)
iris_test <- testing(iris_split)
mod <- linear_reg(penalty = tune()) %>% set_engine("glmnet")
rec <- recipe(Sepal.Length ~ ., iris_train) %>%
step_mutate(Sepal.Length = Sepal.Length/1000, skip = TRUE)
wf <- workflow() %>%
add_model(mod) %>%
add_recipe(rec)
iris_resample <- vfold_cv(iris_train)
iris_tune_grid <- tune_grid(wf, iris_resample)
autoplot(iris_tune_grid)
mod_fit <- wf %>%
finalize_workflow(select_best(iris_tune_grid, "rmse")) %>%
fit(iris_train)
predict(mod_fit, iris_test)
PS: Be aware that there are risks of misunderstanding of ML fundamentals from my part. I'm sorry if it is the case, please let me know.
And congratulations for all the amazing work on tidymodels!
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
hardhat::forge()
returns error when recipes transforms outcomes.sessioninfo()