tidymodels / butcher

Reduce the size of model objects saved to disk
https://butcher.tidymodels.org/
Other
130 stars 13 forks source link

consider warning if specific `axe_*` methods are not available #214

Open dpprdan opened 2 years ago

dpprdan commented 2 years ago

Please consider adding a warning to axe_*.default() methods that the package, which created the object (and might contain more specific axe_* methods) is not loaded.

I ran into this while trying to butcher() a workflow after loading it back in. (I am not butcher()ing before saving, due to #147)

The xgb_mod.rds is the workflow used in https://github.com/tidymodels/workflows/issues/138

library(butcher)
xgb_mod <- readRDS("xgb_mod.rds")

## No message or warning without `verbose = TRUE` even though nothing is happening
xgb_btchrd <- butcher(xgb_mod)
all.equal(xgb_btchrd, xgb_mod)
#> [1] TRUE

## Cannot butcher if method is not available (but `verbose` message does not say that)
xgb_btchrd <- butcher(xgb_mod, TRUE)
#> x No memory released. Do not butcher.

library(workflows)

## Now we're talking
xgb_btchrd <- butcher(xgb_mod, TRUE)
#> Warning in as.function.default(c(formals(x), body(x)), env = rlang::base_env()):
#> partial argument match of 'env' to 'envir'
#> v Memory released: '943,080 B'

Created on 2022-02-25 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.1.2 (2021-11-01) #> os Windows 10 x64 (build 19043) #> system x86_64, mingw32 #> ui RTerm #> language en #> collate German_Germany.1252 #> ctype German_Germany.1252 #> tz Europe/Berlin #> date 2022-02-25 #> pandoc 2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown) #> #> - Packages ------------------------------------------------------------------- #> package * version date (UTC) lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) #> backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.2) #> butcher * 0.1.5.9000 2022-02-25 [1] Github (tidymodels/butcher@d7ed75f) #> class 7.3-19 2021-05-03 [2] CRAN (R 4.1.2) #> cli 3.2.0 2022-02-14 [1] CRAN (R 4.1.2) #> codetools 0.2-18 2020-11-04 [2] CRAN (R 4.1.2) #> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.1.2) #> crayon 1.5.0 2022-02-14 [1] CRAN (R 4.1.2) #> data.table 1.14.2 2021-09-27 [1] CRAN (R 4.1.1) #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.2) #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2) #> dplyr 1.0.8 2022-02-08 [1] CRAN (R 4.1.2) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) #> evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.2) #> fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.2) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) #> fs 1.5.2.9000 2022-02-03 [1] Github (r-lib/fs@6d1182f) #> future 1.24.0 2022-02-19 [1] CRAN (R 4.1.2) #> future.apply 1.8.1 2021-08-10 [1] CRAN (R 4.1.1) #> generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.2) #> ggplot2 3.3.5 2021-06-25 [1] CRAN (R 4.1.0) #> globals 0.14.0 2020-11-22 [1] CRAN (R 4.1.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.2) #> gower 1.0.0 2022-02-03 [1] CRAN (R 4.1.2) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0) #> hardhat 0.2.0 2022-01-24 [1] CRAN (R 4.1.2) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1) #> ipred 0.9-12 2021-09-15 [1] CRAN (R 4.1.1) #> jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.1.2) #> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.2) #> lattice 0.20-45 2021-09-22 [2] CRAN (R 4.1.2) #> lava 1.6.10 2021-09-02 [1] CRAN (R 4.1.1) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1) #> listenv 0.8.0 2019-12-05 [1] CRAN (R 4.1.0) #> lobstr 1.1.1 2019-07-02 [1] CRAN (R 4.1.1) #> lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.1.1) #> magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.1.2) #> MASS 7.3-54 2021-05-03 [2] CRAN (R 4.1.2) #> Matrix 1.3-4 2021-06-01 [2] CRAN (R 4.1.2) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0) #> nnet 7.3-16 2021-05-03 [2] CRAN (R 4.1.2) #> parallelly 1.30.0 2021-12-17 [1] CRAN (R 4.1.2) #> parsnip 0.1.7.9006 2022-02-25 [1] Github (tidymodels/parsnip@3e2447c) #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) #> prodlim 2019.11.13 2019-11-17 [1] CRAN (R 4.1.1) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1) #> Rcpp 1.0.8 2022-01-13 [1] CRAN (R 4.1.2) #> recipes 0.2.0 2022-02-18 [1] CRAN (R 4.1.2) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0) #> rlang 1.0.1 2022-02-03 [1] CRAN (R 4.1.2) #> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1) #> rpart 4.1-15 2019-04-12 [2] CRAN (R 4.1.2) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2) #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.1) #> survival 3.2-13 2021-08-24 [2] CRAN (R 4.1.2) #> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.2) #> tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.1.2) #> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.2) #> timeDate 3043.102 2018-02-21 [1] CRAN (R 4.1.1) #> usethis 2.1.5 2021-12-09 [1] CRAN (R 4.1.2) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) #> withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.2) #> workflows * 0.2.4.9002 2022-02-25 [1] Github (tidymodels/workflows@4e348f8) #> xfun 0.29 2021-12-14 [1] CRAN (R 4.1.2) #> xgboost 1.5.2.1 2022-02-21 [1] CRAN (R 4.1.2) #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.2) #> #> [1] C:/Users/Daniel.AK-HAMBURG/Documents/R/win-library/4.1 #> [2] C:/Program Files/R/R-4.1.2/library #> #> ------------------------------------------------------------------------------ ```
amazongodman commented 2 years ago

There are similar reports here as well. https://github.com/tidymodels/parsnip/issues/459

I also faced a similar error. It was when I used butcher to reduce the weight of the xgboost model. Shows the reproducible code and the version of the package.

df <- mtcars
df$am <- as.factor(df$am)

fitted_model <- boost_tree(trees = 15) %>%
  set_engine("xgboost") %>%
  set_mode("classification") 

rec <- recipe(am ~ .,data = df) %>% 
  step_dummy(all_nominal_predictors())

wfl <- workflow() %>% 
  add_model(fitted_model) %>% 
  add_recipe(rec) %>% 
  fit(df)

temp<- wfl %>% butcher()
predict(temp,new_data=df)
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
 [1] rlang_1.0.2        lobstr_1.1.1       sparklyr_1.7.5    
 [4] rpart_4.1.16       butcher_0.1.5      yardstick_0.0.9   
 [7] workflowsets_0.2.1 workflows_0.2.6    tune_0.2.0        
[10] tidyr_1.2.0        tibble_3.1.6       rsample_0.1.1     
[13] recipes_0.2.0      purrr_0.3.4        parsnip_0.2.1     
[16] modeldata_0.1.1    infer_1.0.0        ggplot2_3.3.5     
[19] dplyr_1.0.8        dials_0.1.0        scales_1.1.1      
[22] broom_0.7.12       tidymodels_0.2.0  

loaded via a namespace (and not attached):
 [1] httr_1.4.2         jsonlite_1.8.0     splines_4.1.2     
 [4] foreach_1.5.2      prodlim_2019.11.13 assertthat_0.2.1  
 [7] GPfit_1.0-8        yaml_2.3.5         r2d3_0.2.6        
[10] globals_0.14.0     ipred_0.9-12       pillar_1.7.0      
[13] backports_1.4.1    lattice_0.20-45    glue_1.6.2        
[16] pROC_1.18.0        digest_0.6.29      hardhat_0.2.0     
[19] colorspace_2.0-3   htmltools_0.5.2    Matrix_1.4-1      
[22] plyr_1.8.6         timeDate_3043.102  pkgconfig_2.0.3   
[25] lhs_1.1.5          DiceDesign_1.9     listenv_0.8.0     
[28] config_0.3.1       gower_1.0.0        lava_1.6.10       
[31] generics_0.1.2     usethis_2.1.5      xgboost_1.6.0.1   
[34] ellipsis_0.3.2     withr_2.5.0        furrr_0.2.3       
[37] nnet_7.3-17        cli_3.2.0          survival_3.3-1    
[40] magrittr_2.0.2     crayon_1.5.0       fs_1.5.2          
[43] future_1.24.0      fansi_1.0.2        parallelly_1.30.0 
[46] MASS_7.3-56        class_7.3-20       tools_4.1.2       
[49] data.table_1.14.2  lifecycle_1.0.1    munsell_0.5.0     
[52] compiler_4.1.2     signal_0.7-7       forge_0.2.0       
[55] grid_4.1.2         iterators_1.0.14   rstudioapi_0.13   
[58] rappdirs_0.3.3     htmlwidgets_1.5.4  base64enc_0.1-3   
[61] gtable_0.3.0       codetools_0.2-18   DBI_1.1.2         
[64] R6_2.5.1           lubridate_1.8.0    fastmap_1.1.0     
[67] future.apply_1.8.1 utf8_1.2.2         rprojroot_2.0.3   
[70] parallel_4.1.2     Rcpp_1.0.8.3       vctrs_0.3.8       
[73] dbplyr_2.1.1       tidyselect_1.1.2  
dpprdan commented 2 years ago

@amazongodman from looking at your example, I am not entirely sure whether it applies to this issue here (or the one reported over at parsnip for that matter), because you must have loaded {workflows}, so workflow's axe_*() methods must be available? Anyway, my initial example is not a very good one, because one shouldn't use saveRDS()/readRDS() directly on {xgboost} models. That doesn't change the issue I reported here, though.