rstudio / vetiver-r

Version, share, deploy, and monitor models
https://rstudio.github.io/vetiver-r/
Other
181 stars 27 forks source link

Does vetiver::vetiver_sm_build() need an open firewall to work? #247

Closed ldominguez50 closed 1 year ago

ldominguez50 commented 1 year ago

Hi!

I have been experimenting with deploying a small model within my organization's Sagemaker platform.

I'm running into an issue when trying to build the Docker image.

new_image_uri <- vetiver_sm_build(board = board,
+                                   name = "my_model",
+                                   docker_args = list(base_image = "FROM our base image"))
- Lockfile written to "vetiver_renv.lock".                                                                                    
Error in curl::curl_fetch_memory(req_url) : 
  transfer closed with 1007 bytes remaining to read

I'm not sure if the message is telling me that the function is trying to call an external website for some information or that there are some permissions issues building the Docker container. If this is the case, do you have alternatives to building a docker container and saving the Sagemaker model?

I know is not a reprex but let me know if you need more information.

Thanks!

juliasilge commented 1 year ago

Hmmm, that doesn't look super familiar to me but people's SageMaker instances have quite a bit of variety in terms of auth and such. So we can figure out next steps for you, can you share if you working locally or working from inside the SageMaker infrastructure?

ldominguez50 commented 1 year ago

I am using RStudio on Sagemaker.

I got some information from our Infrastructure analyst about the permissions we have.

Our studio team/project roles have the Create/Update/Delete permissions for Models and Model Endpoints. However, our roles will not have permission for AWS Codebuild. Our org uses Github Actions to build Docker images. What would be the best approach then?

I appreciate your help!

juliasilge commented 1 year ago

Ah OK, here is what I would do in that situation:

The vetiver package has these more modular, lower level functions for use cases like yours that are not as straightforward

ldominguez50 commented 1 year ago

Got it! I will follow your instructions and see if that works

Thanks for your help!

ldominguez50 commented 1 year ago

I think I have isolated the problem I am having. I will put a reprex here

library(tidyverse)
library(tidymodels)
library(vetiver)
library(pins)

diamonds <- ggplot2::diamonds %>% 
  mutate(price = log(price))

lm_spec <- linear_reg() 
lm_wf <- workflow(price ~ ., lm_spec)
diamonds_fit <- lm_wf %>% fit(data = diamonds)

# create the vetiver model
v_model <- vetiver_model(diamonds_fit, "diamonds_test_model")

# versioning to S3
bucket <- "my_bucket"
board <- board_s3(bucket)
vetiver_pin_write(board, v_model)

# write plumber file
# (this part worked no problem)
vetiver_write_plumber(board,
                                     "diamonds_test_model",
                                      path = "/invocations",
                                      file = "plumber.R")

# write docker file
vetiver_write_docker(v_model,
                     # we use artifactory as our repo
                     rspm = FALSE,
                     port = 8080,
                     base_image = "FROM our base image")

> - Lockfile written to "vetiver_renv.lock".
> Error in curl::curl_fetch_memory(req_url) : 
>   transfer closed with 784 bytes remaining to read

I think the problem comes when the vetiver_write_docker() tries to get the system requirements from the Posit Public Package Manager. I was looking at the lower level function glue_sys_reqs() and I thought that might be the problem as our AWS environment is behind a firewall.

Any others having this issue?

juliasilge commented 1 year ago

Ah interesting, yep, thanks for helping diagnose the problem.

Do you have access to a different Package Manager, like the paid, professional version? If so, you should be able to set these environment variables to tell it to look in a different place:

https://github.com/rstudio/vetiver-r/blob/fc8b34980553bac403dc43385dedeae18cbfec36/R/write-docker.R#L143-L144

Another option is to explore #240. Are you able to use pak::pkg_sysreqs() behind your firewall?

res <- pak::pkg_sysreqs(
    c("parsnip", "xgboost", "workflows"), 
    ## use whatever is appropriate for your Docker container here:
    sysreqs_platform = "x86_64-pc-linux-gnu-debian-10"
)
#> ℹ Loading metadata database
#> ✔ Loading metadata database ... done
#> 

res$install_scripts
#> [1] "apt-get -y install libicu-dev zlib1g-dev make"

Created on 2023-10-02 with reprex v2.0.2

Another option of course would be to set a rule in your firewall to allow it to access the Public Package Manager. This will have a lot of benefits for you because without it, it will take a loooooooong time for your Docker image to build (building all packages from source) and some packages may be difficult to build from source at all.

ldominguez50 commented 1 year ago

Thanks for all the help. I will work something out with the architecture team. There seem to be lots of pros to using the public package manager knowing that building packages from source can be problematic.

Thanks again!

(not sure if you want to close the issue or leave it open)

juliasilge commented 1 year ago

We definitely find that using the Public Package Manager is a big quality of life improvement for folks! I'll close this now, but definitely feel free to ask more questions or chime in on #240 if that would solve your problem.