rstudio / vetiver-r

Version, share, deploy, and monitor models
https://rstudio.github.io/vetiver-r/
Other
177 stars 28 forks source link

Questions: Using models trained outside R with vetiver #214

Closed aminadibi closed 8 months ago

aminadibi commented 1 year ago

Thanks for the great work on this package.

Question: vetiver_models() expects a model object as a parameter. In our organization we have a large number of legacy models trained in SAS. Suppose we have an equation that came from fitting a model outside R, and we have an R function that makes predictions based on that equation. Is it possible to manually create a model object based on that function, so that the model trained outside R can be deployed and monitored with vetiver?

Thanks

juliasilge commented 1 year ago

You can definitely do this, yes! To make it work, you'll need to wrap your head around how vetiver dispatches off of classes. You might want to take a look at the code for lm() and the code for tidymodels to see more details.

To get started, you need to add a class to your function:

my_ported_model <- function(df) {
    df$x + 2 * df$y + abs(df$z)
}

class(my_ported_model) <- c("ported_model", class(my_ported_model))
class(my_ported_model)
#> [1] "ported_model" "function"

You probably also want to make a predict method:

predict.ported_model <- function(object, newdata) {
    my_ported_model(newdata)
}

Next you need to make the methods for your new model type. At a minimum you need these three (but you might also look at the links I included above for things like how to track needed packages):

library(vetiver)

vetiver_create_description.ported_model <- function(model) {
    "A model that I ported from SAS"
}

vetiver_ptype.ported_model <- function(model, ...) {
    vctrs::vec_ptype(tibble::tibble(x = integer(), y = integer(), z = integer()))
}

handler_predict.ported_model <- function(vetiver_model, ...) {

    ptype <- vetiver_model$prototype

    function(req) {
        newdata <- req$body
        newdata <- vetiver_type_convert(newdata, ptype)
        newdata <- hardhat::scream(newdata, ptype)
        ret <- predict(vetiver_model$model, newdata = newdata, ...)
        list(.pred = ret)
    }

}

You can put that all together and make your vetiver model object:

v <- vetiver_model(my_ported_model, "my-simple-model")
v
#> 
#> ── my-simple-model ─ <ported_model> model for deployment 
#> A model that I ported from SAS using 3 features

You can interact with the API this created like so:

library(plumber)
pr() |> 
    vetiver_api(v)
#> # Plumber router with 3 endpoints, 4 filters, and 1 sub-router.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/logo
#> │  │ # Plumber static router serving from directory: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/vetiver
#> ├──/metadata (GET)
#> ├──/ping (GET)
#> └──/predict (POST)

Created on 2023-06-06 with reprex v2.0.2

If you then pipe to pr_run() you can interact with it locally: Screenshot 2023-06-06 at 8 48 07 AM

All that being said, it might be easier to not use vetiver at all and write your function into a plumber file directly. I think it depends on how helpful the other pieces of vetiver are in your use case, like versioning as a pin (this would work fine using what I outlined above) and monitoring.

aminadibi commented 1 year ago

Thanks so much @juliasilge, this is super helpful!

I managed to create the vetiver model object and interact with the API locally. I also managed to pin it to S3 and build the docker and push the image to ECR. But when I tried creating the SageMaker Endpoint I got a ! There is no method available to build a prediction handler for 'x' error. I dug a bit deeper and noticed that when I port the model and make my vetiver model object, I can call handler_predict() on it with no issues. When I pin the model object, and read the pin again, I can still call handler_predict() with no issues. But, when I clear my environment, re-read the model object from the pin, and call handler_predict() on it, I get the same error

Error in `handler_predict()`:
! There is no method available to build a prediction handler for `x`.

This essentially prevents me from running the docker container and deploying the ported model.

Here's a reprex:

library(vetiver)
library(pins)
my_ported_model <- function(df) {
  df$x + 2 * df$y + abs(df$z)
}

class(my_ported_model) <- c("ported_model", class(my_ported_model))
class(my_ported_model)
#> [1] "ported_model" "function"

predict.ported_model <- function(object, newdata) {
  my_ported_model(newdata)
}

vetiver_create_description.ported_model <- function(model) {
  "A model that I ported from SAS"
}

vetiver_ptype.ported_model <- function(model, ...) {
  vctrs::vec_ptype(tibble::tibble(x = integer(), y = integer(), z = integer()))
}

handler_predict.ported_model <- function(vetiver_model, ...) {
  ptype <- vetiver_model$prototype

  function(req) {
    newdata <- req$body
    newdata <- vetiver_type_convert(newdata, ptype)
    newdata <- hardhat::scream(newdata, ptype)
    ret <- predict(vetiver_model$model, newdata = newdata, ...)
    list(.pred = ret)
  }
}

v <- vetiver_model(my_ported_model, "my-simple-model")
board <- board_local()

handler_predict(v)
#> function(req) {
#>     newdata <- req$body
#>     newdata <- vetiver_type_convert(newdata, ptype)
#>     newdata <- hardhat::scream(newdata, ptype)
#>     ret <- predict(vetiver_model$model, newdata = newdata, ...)
#>     list(.pred = ret)
#>   }
#> <environment: 0x12085ab58>
vetiver_pin_write(board, v)
#> Replacing version '20230608T013737Z-e77d7' with '20230608T013821Z-a04bc'
#> Writing to pin 'my-simple-model'
#> 
#> Create a Model Card for your published model
#> • Model Cards provide a framework for transparent, responsible reporting
#> • Use the vetiver `.Rmd` template as a place to start

v <- vetiver_pin_read(board, "my-simple-model")
handler_predict(v)
#> function(req) {
#>     newdata <- req$body
#>     newdata <- vetiver_type_convert(newdata, ptype)
#>     newdata <- hardhat::scream(newdata, ptype)
#>     ret <- predict(vetiver_model$model, newdata = newdata, ...)
#>     list(.pred = ret)
#>   }
#> <bytecode: 0x120f0aa80>
#> <environment: 0x12425bc08>

rm(list = ls(all.names = TRUE))

library(pins)
board <- board_local()
v <- vetiver_pin_read(board, "my-simple-model")
handler_predict(v)
#> Error in `handler_predict()`:
#> ! There is no method available to build a prediction handler for `x`.
#> Backtrace:
#>     ▆
#>  1. ├─vetiver::handler_predict(v)
#>  2. └─vetiver:::handler_predict.default(v)
#>  3.   └─rlang::abort("There is no method available to build a prediction handler for `x`.")

Created on 2023-06-07 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.0 (2023-04-21) #> os macOS Ventura 13.4 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Vancouver #> date 2023-06-07 #> pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0) #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.3.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) #> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.2 2023-04-25 [1] CRAN (R 4.3.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0) #> knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pins * 1.2.0 2023-05-18 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.0) #> rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.3.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0) #> rmarkdown 2.21 2023-03-26 [1] CRAN (R 4.3.0) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> styler 1.10.1 2023-06-05 [1] CRAN (R 4.3.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0) #> vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.3.0) #> vetiver * 0.2.1 2023-05-16 [1] CRAN (R 4.3.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0) #> xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
juliasilge commented 1 year ago

Ah, yep! So the description and ptype are stored with the model (i.e. in the pin) because those are typical pieces of metadata that we expect all models to need.

The predict method and the handler_predict method are both built into packages for typical models that we support; the predict method usually lives in whatever package makes the model and the handler_predict methods for vetiver models live in vetiver. You have two options for what to do here:

  1. You can inline these functions into your plumber file, so your plumber file will look something like this:
# Generated by the vetiver package; edit with care

library(pins)
library(plumber)
library(rapidoc)
library(vetiver)
b <- board_s3(bucket = "<redacted>")
v <- vetiver_pin_read(b, "cars1")

predict.ported_model <- function(object, newdata) {
    my_ported_model(newdata)
}

handler_predict.ported_model <- function(vetiver_model, ...) {
  ptype <- vetiver_model$prototype

  function(req) {
    newdata <- req$body
    newdata <- vetiver_type_convert(newdata, ptype)
    newdata <- hardhat::scream(newdata, ptype)
    ret <- predict(vetiver_model$model, newdata = newdata, ...)
    list(.pred = ret)
  }
}

#* @plumber
function(pr) {
    pr %>% vetiver_api(v)
}

If your model can use the same prediction handler as, say, lm(), then you could do this:

# Generated by the vetiver package; edit with care

library(pins)
library(plumber)
library(rapidoc)
library(vetiver)
b <- board_s3(bucket = "<redacted>")
v <- vetiver_pin_read(b, "cars1")

predict.ported_model <- function(object, newdata) {
    my_ported_model(newdata)
}

handler_predict.ported_model <- handler_predict.lm

#* @plumber
function(pr) {
    pr %>% vetiver_api(v)
}

To be extra clear, I mean literally go into the generated plumber file and edit it, before you deploy to Connect or build a Docker container or anything.

  1. You can package these functions up into an internal R package for your org, and then declare that as a dependency of your model. If you do this, your methods will look something like:
vetiver_create_description.ported_model <- function(model) {
    "A model that I ported from SAS"
}

vetiver_ptype.ported_model <- function(model, ...) {
    vctrs::vec_ptype(tibble::tibble(x = integer(), y = integer(), z = integer()))
}

vetiver_create_meta.ported_model <- function(model, metadata) {
    vetiver_meta(metadata, required_pkgs = "yourinternalpackagename")
}

handler_predict.ported_model <- handler_predict-lm

If your ported SAS model already lives in an R package, this would be a pretty good option, I think, and it would let you use, say, the automated workflows offered by vetiver_sm_build() and such for SageMaker in a straightforward way.

aminadibi commented 1 year ago

Thanks so much @juliasilge 🙏 And apologies that I keep asking questions. I do think there are many who would benefit from being able to port and deploy already-trained models with vetiver in a straightforward manner.

It seems that the second solution would not work, because vetiver functions will refer to vetiver's methods instead of that of my package. For instance, vetiver_model() invokes vetiver::vetiver_create_description(model) instead of mypackage::vetiver_create_description(model). Am I missing something?

Now I cloned my own version of vetiver and added required methods, so I am good, but do you think it's feasible/useful to add a ported_model class to vetiver itself, for all ported models?

Also related, I am curious why glue_required_packages puts loading of required packages inside a if (FALSE) statement? https://github.com/rstudio/vetiver-r/blob/fd9dbf82745529d3c5363b370ce9361d70cd49fd/R/write-plumber.R#LL93C1-L108C1

juliasilge commented 1 year ago

No worries at all about questions; we appreciate the feedback so much!

I think you're right that we should consider making it easier for folks to extend vetiver to custom models. In Python, this already works because of how the handlers are set up. In R, we are using S3 methods so it's not as straightforward right now. In the longer term, we could think about how to expose an easier way to support custom models. In the shorter term, it's worth documenting what you need to do to extend vetiver for now. Some similar documentation exists for butcher and broom. Maybe we could make something like butcher's new_model_butcher() that generates a file with example code to start from. This unfortunately won't work, because everyone will need different implementations:

do you think it's feasible/useful to add a ported_model class to vetiver itself, for all ported models?

You can actually create methods for a generic in another package, i.e. make your own vetiver_create_description.ported_model method even though the generic for vetiver_create_description lives in vetiver. You can read more about that in R Packages but the general idea is that your own package will need to import the vetiver package (like in DESCRIPTION) and then specifically import the generic you want to use (like in a roxygen chunk):

#' @importFrom vetiver handler_predict

Also related, I am curious why glue_required_packages puts loading of required packages inside a if (FALSE) statement?

This is specifically related to how Posit Connect does automatic dependency discovery. It doesn't hurt anything but you can turn that "off" by using rsconnect = FALSE in vetiver_write_plumber(). The function vetiver_prepare_docker() uses FALSE since those models aren't headed to Posit Connect.

juliasilge commented 8 months ago

Thank you again so much for these questions; e are tracking how to make this easier in #219 and I believe answered the other questions surfaced. Let us know if anything else comes up! 🙌

rsh52 commented 5 months ago

Hi! This thread has been extremely helpful in walking through how to set up a custom model we have to work with vetiver. Thank you both for explaining these steps.

I'm running into one last hurdle in getting my vetiver API to deploy successfully to Connect. Locally, everything is set up and running well out of a package the model is housed in with custom vetiver methods. However, I keep getting that same error at the deployment location:

There is no method available to build a prediction handler for `x`.

I'm using vetiver_create_rsconnect_bundle() in combination with connectapi to bundle and deploy the API, and I'm curious if there's something in particular I need to do to enable the deployment to see/prioritize the custom methods.

client <- connectapi::connect()

# Vetiver preparation
  board <- pins::board_connect()
  v <- vetiver::vetiver_pin_read(board, name = "user.name/model")
  bundle_vetiver <- vetiver::vetiver_create_rsconnect_bundle(board,
                                                             name = "user.name/model")

  rsconnect::writeManifest()

  # Bundle the application
  bundle <- connectapi::bundle_path(bundle_vetiver)

  # Deploy to Connect
  client |>
    connectapi::deploy(bundle, guid = guid, name = "MyModelAPI")

I ran into the same issue with vetiver_deploy_rsconnect().

juliasilge commented 4 months ago

Ah, I believe that the predict method is not getting packaged up so that it is available/installed in the bundle on Connect.

Locally, everything is set up and running well out of a package

Do I understand correctly that a) the new predict method is in an R package and b) you have done something like this to record that you need that package?

vetiver_create_meta.ported_model <- function(model, metadata) {
    vetiver_meta(metadata, required_pkgs = "yourinternalpackagename")
}
rsh52 commented 4 months ago

Ah, I believe that the predict method is not getting packaged up so that it is available/installed in the bundle on Connect.

Locally, everything is set up and running well out of a package

Do I understand correctly that a) the new predict method is in an R package and b) you have done something like this to record that you need that package?

vetiver_create_meta.ported_model <- function(model, metadata) {
    vetiver_meta(metadata, required_pkgs = "yourinternalpackagename")
}

Ah, I definitely didn't consider this and it makes total sense. You are correct, the way we have things set up right now is probably non-kosher, but we have a package housing our model and custom prediction functions and we are setting up the vetiver logic with the custom methods there as well. I believe all of the imports are set up correctly.

Unfortunately even after attaching the custom internal package and verifying it in the model from vetiver_pin_read()$metadata$required_pkgs, I'm still getting the same Connect error message.

  • When you look at the plumber file that gets generated (you can look in the bundle, or see what you get with vetiver_write_plumber()) do you see library(yourinternalpackagename) in the plumber file?

Our package is in the if (FALSE) { } portion of the plumber.R output from vetiver::vetiver_create_rsconnect_bundle().

  • What does the manifest.json look like in your bundle? Does it have your internal package in it?

Our package is in the manifest.json that comes out of the vetiver::vetiver_create_rsconnect_bundle() output (not if I run rsconnect::writeManifest(), but I don't think we need that when using the former).

Wondering if we just need to redesign and take the vetiver components out of the existing package.

juliasilge commented 4 months ago

The one other thing that comes to mind is to look at how Connect is trying to install your package. For example, if you make a dead simple .Rmd with something like:

```r
library(yourpackagename)
library(vetiver)
sloop::s3_methods_generic("predict")

can you publish it to Connect such that the right `predict` method shows up as available, when executed on Connect (i.e. not only published as HTML)? Since your internal package is not on CRAN, how do you typically use it for, say, Shiny apps and other executable code on Connect? Do you use Posit Package Manager for your private/internal packages?

As a temporary workaround, you can inline your `predict` method in your plumber file, and then use `rsconnect::deployAPI("plumber.R")` to deploy it. Your plumber file will look something like:

```r
# Generated by the vetiver package; edit with care

library(pins)
library(plumber)
library(rapidoc)
library(vetiver)

# Packages needed to generate model predictions
if (FALSE) {
    library(importantpackagehere)     
}
b <- board_connect()
v <- vetiver_pin_read(b, "cars1")

predict.your_model_type <- function(object, newdata) {
    ## get your predictions from your `object`
}

handler_predict.your_model_type <- handler_predict.lm ## or whatever you have for your handler

#* @plumber
function(pr) {
    pr %>% vetiver_api(v)
}
rsh52 commented 4 months ago

Thanks for the pointers! I am relying on handler_predict.my_model over predict.my_model after following the tidymodels example, and confirmed that when making a simple .RMD, handler_predict.my_model shows up in the s3_methods_generic() output (though visible is FALSE).

When I try the plumber.R method, it still doesn't seem to pick up the new methods even when I inline them. I stepped through vetiver_api() with s3_methods_generic() and confirmed it doesn't pick up the new classes unless I load them into the global environment (which doesn't help with deployment). Very odd, but I'll keep at trying to figure out what's going on there.

Our package is hosted on GitHub enterprise, we don't use RSPM. For these things we point renv at the GHE installation location.

juliasilge commented 4 months ago

@rsh52 Would you be able to make an example that I could run to try to see what is going on? I am thinking something like a dummy model with its own handler_predict (very simple) that works locally but fails deploying to Connect? If you can do that, can you open a new issue with this sort-of-a-reprex so I can dig into it more?

rsh52 commented 4 months ago

@juliasilge, I think we found a solution for our set up and I confirmed it helps resolve identical behavior that was in this simple repo.

What we found was moving our package library call outside of:

# Packages needed to generate model predictions
if (FALSE) {
    library(importantpackagehere)     
}

allowed the deployment to see the custom methods. In my simple repo, I found that also resolved the issue I was having on Connect by adding the library() call that wasn't there in the initial file output.

Perhaps there could be an option to edit the infra_pkgs in vetiver_write_plumber()?

https://github.com/rstudio/vetiver-r/blob/581a4e98d9673013a386a9715f180f025fc3f03f/R/write-plumber.R#L96

If you think it's worthwhile I can open an issue.

juliasilge commented 4 months ago

Oh, that is kind of mystifying to me 🤯 but must have something to do with method registration???

I do think let's open an issue to provide additional infra_pkgs. Sounds like it would be necessary to really support #219 well.