tidymodels / hardhat

Construct Modeling Packages
https://hardhat.tidymodels.org
Other
103 stars 17 forks source link

forge returns error when recipes transforms outcomes. #129

Closed Athospd closed 4 years ago

Athospd commented 4 years ago

hardhat::forge() returns error when recipes transforms outcomes.

library(tidymodels)
final_model <- parsnip::boost_tree(mode = "regression", trees = 5)

rec <- recipes::recipe(mpg ~ ., mtcars) %>%
  recipes::step_center(recipes::all_outcomes()) ## <------- the code works without this step

wf <- workflows::workflow() %>%
  workflows::add_model(final_model) %>%
  workflows::add_recipe(rec)

wf_fit <- fit(wf, mtcars)

predict(wf_fit, mtcars)

### isolating the chunck with problem inside workflows:::predict.workflow() ---------------
new_data <- mtcars
blueprint <- wf_fit$pre$mold$blueprint
forged <- hardhat::forge(new_data, blueprint)

sessioninfo()

─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.3 (2020-02-29)
 os       Ubuntu 18.04.4 LTS          
 system   x86_64, linux-gnu           
 ui       RStudio                     
 language pt_BR:pt:en                 
 collate  pt_BR.UTF-8                 
 ctype    pt_BR.UTF-8                 
 tz       America/Sao_Paulo           
 date     2020-04-22                  

─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package       * version    date       lib source        
 assertthat      0.2.1      2019-03-21 [1] CRAN (R 3.6.1)
 backports       1.1.5      2019-10-02 [1] CRAN (R 3.6.1)
 base64enc       0.1-3      2015-07-28 [1] CRAN (R 3.6.1)
 bayesplot       1.7.0      2019-05-23 [1] CRAN (R 3.6.1)
 boot            1.3-24     2019-12-20 [4] CRAN (R 3.6.2)
 broom         * 0.5.5      2020-02-29 [1] CRAN (R 3.6.3)
 callr           3.4.3      2020-03-28 [1] CRAN (R 3.6.3)
 class           7.3-16     2020-03-25 [4] CRAN (R 3.6.3)
 cli             2.0.2      2020-02-28 [1] CRAN (R 3.6.1)
 clipr           0.7.0      2019-07-23 [1] CRAN (R 3.6.1)
 codetools       0.2-16     2018-12-24 [4] CRAN (R 3.5.2)
 colorspace      1.4-1      2019-03-18 [1] CRAN (R 3.6.1)
 colourpicker    1.0        2017-09-27 [1] CRAN (R 3.6.1)
 crayon          1.3.4      2017-09-16 [1] CRAN (R 3.6.1)
 crosstalk       1.0.0      2016-12-21 [1] CRAN (R 3.6.1)
 data.table      1.12.8     2019-12-09 [1] CRAN (R 3.6.3)
 dials         * 0.0.4      2019-12-02 [1] CRAN (R 3.6.3)
 DiceDesign      1.8-1      2019-07-31 [1] CRAN (R 3.6.1)
 digest          0.6.25     2020-02-23 [1] CRAN (R 3.6.1)
 dplyr         * 0.8.5      2020-03-07 [1] CRAN (R 3.6.3)
 DT              0.13       2020-03-23 [1] CRAN (R 3.6.3)
 dygraphs        1.1.1.6    2018-07-11 [1] CRAN (R 3.6.1)
 ellipsis        0.3.0      2019-09-20 [1] CRAN (R 3.6.1)
 evaluate        0.14       2019-05-28 [1] CRAN (R 3.6.1)
 fansi           0.4.1      2020-01-08 [1] CRAN (R 3.6.1)
 foreach         1.4.7      2019-07-27 [1] CRAN (R 3.6.1)
 fs              1.3.1      2019-05-06 [1] CRAN (R 3.6.1)
 furrr           0.1.0      2018-05-16 [1] CRAN (R 3.6.1)
 future          1.14.0     2019-07-02 [1] CRAN (R 3.6.1)
 generics        0.0.2      2018-11-29 [1] CRAN (R 3.6.1)
 ggplot2       * 3.3.0      2020-03-05 [1] CRAN (R 3.6.3)
 ggridges        0.5.1      2018-09-27 [1] CRAN (R 3.6.1)
 globals         0.12.5     2019-12-07 [1] CRAN (R 3.6.1)
 glue            1.4.0      2020-04-03 [1] CRAN (R 3.6.3)
 gower           0.2.1      2019-05-14 [1] CRAN (R 3.6.1)
 GPfit           1.0-8      2019-02-08 [1] CRAN (R 3.6.1)
 gridExtra       2.3        2017-09-09 [1] CRAN (R 3.6.1)
 gtable          0.3.0      2019-03-25 [1] CRAN (R 3.6.1)
 gtools          3.8.1      2018-06-26 [1] CRAN (R 3.6.1)
 hardhat       * 0.1.2      2020-02-28 [1] CRAN (R 3.6.3)
 htmltools       0.4.0      2019-10-04 [1] CRAN (R 3.6.1)
 htmlwidgets     1.3        2018-09-30 [1] CRAN (R 3.6.1)
 httpuv          1.5.2      2019-09-11 [1] CRAN (R 3.6.3)
 igraph          1.2.4.1    2019-04-22 [1] CRAN (R 3.6.1)
 infer         * 0.5.1      2019-11-19 [1] CRAN (R 3.6.1)
 inline          0.3.15     2018-05-18 [1] CRAN (R 3.6.1)
 ipred           0.9-9      2019-04-28 [1] CRAN (R 3.6.1)
 iterators       1.0.12     2019-07-26 [1] CRAN (R 3.6.1)
 janeaustenr     0.1.5      2017-06-10 [1] CRAN (R 3.6.1)
 knitr           1.28       2020-02-06 [1] CRAN (R 3.6.3)
 later           1.0.0      2019-10-04 [1] CRAN (R 3.6.3)
 lattice         0.20-41    2020-04-02 [4] CRAN (R 3.6.3)
 lava            1.6.6      2019-08-01 [1] CRAN (R 3.6.1)
 lhs             1.0.1      2019-02-03 [1] CRAN (R 3.6.1)
 lifecycle       0.2.0      2020-03-06 [1] CRAN (R 3.6.1)
 listenv         0.7.0      2018-01-21 [1] CRAN (R 3.6.1)
 lme4            1.1-21     2019-03-05 [1] CRAN (R 3.6.1)
 loo             2.1.0      2019-03-13 [1] CRAN (R 3.6.1)
 lubridate       1.7.4      2018-04-11 [1] CRAN (R 3.6.1)
 magrittr        1.5        2014-11-22 [1] CRAN (R 3.6.1)
 markdown        1.1        2019-08-07 [1] CRAN (R 3.6.1)
 MASS            7.3-51.5   2019-12-20 [1] CRAN (R 3.6.1)
 Matrix          1.2-18     2019-11-27 [4] CRAN (R 3.6.1)
 matrixStats     0.55.0     2019-09-07 [1] CRAN (R 3.6.1)
 mime            0.9        2020-02-04 [1] CRAN (R 3.6.3)
 miniUI          0.1.1.1    2018-05-18 [1] CRAN (R 3.6.1)
 minqa           1.2.4      2014-10-09 [1] CRAN (R 3.6.1)
 munsell         0.5.0      2018-06-12 [1] CRAN (R 3.6.1)
 nlme            3.1-144    2020-02-06 [4] CRAN (R 3.6.2)
 nloptr          1.2.1      2018-10-03 [1] CRAN (R 3.6.1)
 nnet            7.3-13     2020-02-25 [4] CRAN (R 3.6.3)
 parsnip       * 0.0.5      2020-01-07 [1] CRAN (R 3.6.3)
 pillar          1.4.3      2019-12-20 [1] CRAN (R 3.6.1)
 pkgbuild        1.0.6      2019-10-09 [1] CRAN (R 3.6.3)
 pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 3.6.1)
 plyr            1.8.4      2016-06-08 [1] CRAN (R 3.6.1)
 prettyunits     1.1.1      2020-01-24 [1] CRAN (R 3.6.1)
 pROC            1.15.3     2019-07-21 [1] CRAN (R 3.6.1)
 processx        3.4.1      2019-07-18 [1] CRAN (R 3.6.1)
 prodlim         2018.04.18 2018-04-18 [1] CRAN (R 3.6.1)
 promises        1.1.0      2019-10-04 [1] CRAN (R 3.6.3)
 ps              1.3.0      2018-12-21 [1] CRAN (R 3.6.1)
 purrr         * 0.3.3      2019-10-18 [1] CRAN (R 3.6.1)
 R6              2.4.1      2019-11-12 [1] CRAN (R 3.6.1)
 Rcpp            1.0.4      2020-03-17 [1] CRAN (R 3.6.1)
 recipes       * 0.1.9      2020-01-07 [1] CRAN (R 3.6.1)
 reprex          0.3.0      2019-05-16 [1] CRAN (R 3.6.1)
 reshape2        1.4.3      2017-12-11 [1] CRAN (R 3.6.1)
 rlang           0.4.5      2020-03-01 [1] CRAN (R 3.6.1)
 rmarkdown       2.1        2020-01-20 [1] CRAN (R 3.6.3)
 rpart           4.1-15     2019-04-12 [4] CRAN (R 3.6.1)
 rsample       * 0.0.5      2019-07-12 [1] CRAN (R 3.6.3)
 rsconnect       0.8.15     2019-07-22 [1] CRAN (R 3.6.1)
 rstan           2.19.2     2019-07-09 [1] CRAN (R 3.6.1)
 rstanarm        2.18.2     2018-11-10 [1] CRAN (R 3.6.1)
 rstantools      1.5.1      2018-08-22 [1] CRAN (R 3.6.1)
 rstudioapi      0.11       2020-02-07 [1] CRAN (R 3.6.1)
 scales        * 1.1.0      2019-11-18 [1] CRAN (R 3.6.3)
 sessioninfo     1.1.1      2018-11-05 [1] CRAN (R 3.6.1)
 shiny           1.3.2      2019-04-22 [1] CRAN (R 3.6.1)
 shinyjs         1.0        2018-01-08 [1] CRAN (R 3.6.1)
 shinystan       2.5.0      2018-05-01 [1] CRAN (R 3.6.1)
 shinythemes     1.1.2      2018-11-06 [1] CRAN (R 3.6.1)
 SnowballC       0.6.0      2019-01-15 [1] CRAN (R 3.6.1)
 StanHeaders     2.19.0     2019-09-07 [1] CRAN (R 3.6.1)
 stringi         1.4.6      2020-02-17 [1] CRAN (R 3.6.1)
 stringr         1.4.0      2019-02-10 [1] CRAN (R 3.6.1)
 survival        3.1-11     2020-03-07 [4] CRAN (R 3.6.3)
 threejs         0.3.1      2017-08-13 [1] CRAN (R 3.6.1)
 tibble        * 2.1.3      2019-06-06 [1] CRAN (R 3.6.1)
 tidymodels    * 0.1.0      2020-02-16 [1] CRAN (R 3.6.3)
 tidyposterior   0.0.2      2018-11-15 [1] CRAN (R 3.6.1)
 tidypredict     0.4.3      2019-09-03 [1] CRAN (R 3.6.1)
 tidyr         * 1.0.2      2020-01-24 [1] CRAN (R 3.6.1)
 tidyselect      1.0.0      2020-01-27 [1] CRAN (R 3.6.1)
 tidytext        0.2.2      2019-07-29 [1] CRAN (R 3.6.1)
 timeDate        3043.102   2018-02-21 [1] CRAN (R 3.6.1)
 tokenizers      0.2.1      2018-03-29 [1] CRAN (R 3.6.1)
 tune          * 0.0.1      2020-02-11 [1] CRAN (R 3.6.3)
 vctrs           0.2.4      2020-03-10 [1] CRAN (R 3.6.1)
 whisker         0.4        2019-08-28 [1] CRAN (R 3.6.1)
 withr           2.1.2      2018-03-15 [1] CRAN (R 3.6.1)
 workflows     * 0.1.1      2020-03-17 [1] CRAN (R 3.6.3)
 xfun            0.12       2020-01-13 [1] CRAN (R 3.6.3)
 xgboost         0.90.0.2   2019-08-01 [1] CRAN (R 3.6.1)
 xtable          1.8-4      2019-04-21 [1] CRAN (R 3.6.1)
 xts             0.11-2     2018-11-05 [1] CRAN (R 3.6.1)
 yardstick     * 0.0.6      2020-03-17 [1] CRAN (R 3.6.3)
 zoo             1.8-6      2019-05-28 [1] CRAN (R 3.6.1)

[1] /home/athos/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
DavisVaughan commented 4 years ago

You likely need to use skip = TRUE in the recipe step that affects the outcome.

See https://github.com/tidymodels/workflows/issues/37

And https://tidymodels.github.io/hardhat/articles/forge.html#a-note-on-recipes

Athospd commented 4 years ago

Just a feedback that this became a major drawback in the way. In my opinion, the smoothness of the workflow breaks with this issue. I don't even know how to deal with it yet. I had to make the outcome's steps outside of the recipe.

I would like to use resamples + tune + workflow + recipe + parsnip seamlessly but I seem that I have to choose between tunning or predicting.

Please guide me if I'm doing it all wrong. I'll be glad to contribute also, let me know how.

topepo commented 4 years ago

Sorry for the frustration. It would help to know more.

Does skip solve the issue?

We can't make the assumption that the outcome data is always available (outside of the original training set). Theoretically, the outcome should only be required during model training.

skip tries to solve this issue by using the step during training (via juice()) but not inside of bake().

Athospd commented 4 years ago

Oh no problem, Max and Davis. I apologize if I sounded rude! I am more enthusiastic than frustrated with your work and tidymodels. OK, to the pain point.

I am been forced to do outcomes transformations outside the recipe.

Why:

The workflow below would be convenient cuz one would not need to revisit any part of the code at any step.

# Reprex: Swtich skip = TRUE/FALSE to compare the results
library(tidymodels)

iris_split <- initial_split(iris %>% select(starts_with("Sepal"), starts_with("Petal")))
iris_train <- training(iris_split)
iris_test <- testing(iris_split)

mod <- linear_reg(penalty = tune()) %>% set_engine("glmnet")
rec <- recipe(Sepal.Length ~ ., iris_train) %>%
  step_mutate(Sepal.Length = Sepal.Length/1000, skip = TRUE)
wf <- workflow() %>% 
  add_model(mod) %>% 
  add_recipe(rec)

iris_resample <- vfold_cv(iris_train)
iris_tune_grid <- tune_grid(wf, iris_resample)

autoplot(iris_tune_grid)

mod_fit <- wf %>% 
  finalize_workflow(select_best(iris_tune_grid, "rmse")) %>% 
  fit(iris_train)

predict(mod_fit, iris_test)

PS: Be aware that there are risks of misunderstanding of ML fundamentals from my part. I'm sorry if it is the case, please let me know.

And congratulations for all the amazing work on tidymodels!

github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.