rstudio / vetiver-r

Version, share, deploy, and monitor models
https://rstudio.github.io/vetiver-r/
Other
181 stars 27 forks source link

Stripping training data when preparing to deploy a model #251

Closed mattwarkentin closed 11 months ago

mattwarkentin commented 11 months ago

Hi,

I am really not sure whether this belongs here or in butcher/workflows/recipes. Happy to move this issue accordingly. Anyway, when preparing a model, one really important consideration is making sure any training data is removed. It is my understanding that vetiver calls butcher under the hood to trim down the workflow object, but the butcher methods for recipe objects don't seem to strip the training data.

Some reading into ?recipes::prep suggests that if you use retain = FALSE then this data won't be included, but prep() is generally called internally during training and so I'm not sure how to avoid bringing the data along for the ride.

Am I using this wrong? Do I need to remove this data manually? I am a bit confused at the best way to ensure training data is stripped from a model/workflow before deployment.

Here is a reprex showing the training data is embedded in the vetiver model.

library(tidymodels)
library(vetiver)
#> 
#> Attaching package: 'vetiver'
#> The following object is masked from 'package:tune':
#> 
#>     load_pkgs

wflow <-
  workflow(
    preprocessor = recipe(mpg ~ ., mtcars),
    spec = linear_reg()
  )

fit <- fit(wflow, mtcars)

vet <- vetiver_model(fit, 'foo')

vet$model$object$pre$actions$recipe$recipe$template
#> # A tibble: 32 × 11
#>      cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb   mpg
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1     6  160    110  3.9   2.62  16.5     0     1     4     4  21  
#>  2     6  160    110  3.9   2.88  17.0     0     1     4     4  21  
#>  3     4  108     93  3.85  2.32  18.6     1     1     4     1  22.8
#>  4     6  258    110  3.08  3.22  19.4     1     0     3     1  21.4
#>  5     8  360    175  3.15  3.44  17.0     0     0     3     2  18.7
#>  6     6  225    105  2.76  3.46  20.2     1     0     3     1  18.1
#>  7     8  360    245  3.21  3.57  15.8     0     0     3     4  14.3
#>  8     4  147.    62  3.69  3.19  20       1     0     4     2  24.4
#>  9     4  141.    95  3.92  3.15  22.9     1     0     4     2  22.8
#> 10     6  168.   123  3.92  3.44  18.3     1     0     4     4  19.2
#> # ℹ 22 more rows
mattwarkentin commented 11 months ago

Seems like maybe I just needed to update some packages...will close this but leave it as a relic for other to remember to update your package before filing issues 😬