rstudio / vetiver-r

Version, share, deploy, and monitor models
https://rstudio.github.io/vetiver-r/
Other
184 stars 28 forks source link

Support deployment through AWS Lambda #128

Closed mdneuzerling closed 1 year ago

mdneuzerling commented 2 years ago

I'm following up on a tweet from a long time ago about implementing this. All the rstudio::conf tweets make me want to build something!

The end goal of this feature would be to simplify and automate the creation of a Dockerfile, the image of which can be deployed as an AWS Lambda function. Users could then access the model through any of the supported AWS integrations (including a HTTP API Gateway).

Like vetiver_write_docker, the goal is not to do the actual deployment and configuration of the endpoint, but to produce a Dockerfile that can be built into an image which is deployed:

Deployment First use... Then create a Dockerfile with...
Docker vetiver_write_plumber vetiver_write_docker
AWS Lambda vetiver_write_lambda_runtime vetiver_write_lambda

If the maintainers are okay with this proposal I'm happy to implement this (with an unknown time frame!). The following changes would need to be made:

juliasilge commented 2 years ago

This would be fantastic @mdneuzerling and we would love to work together with you to get this implemented. I would want to use renv for package management instead of install.packages() to ensure the model has the right versions for predictions.

mdneuzerling commented 2 years ago

Absolutely. The Dockerfile will be very similar to the one for plumber, except it will be a different parent. There's some additional complexities there, like installing R and using CentOS commands instead of Debian.

I'll put something together!

jonthegeek commented 2 years ago

Have you made progress on this @mdneuzerling? I'd very much like to help/test!

mdneuzerling commented 2 years ago

Sorry, I've been held up with personal stuff (lots of corgis).

I've made a start on the Dockerfile work (https://github.com/mdneuzerling/vetiver-r/tree/lambda) and I expect to finish this over the weekend. However, this feature will be 20% code and 80% testing. If you have a Vetiver model handy then I would truly appreciate you trying to deploy when I'm done with the vetiver_write_lambda function.

jonthegeek commented 2 years ago

@mdneuzerling unfortunately my REAL use cases are going to need some other work before {vetiver} will be happy with them, I think... but right now I'm learning how things work, so I plan to prep a couple simple models that work with {vetiver} to use as part of my testing/learning process working toward my real models... which is all to say, ya, I think I can do that! My goal today is to deploy a model like the vetiver deploy example but without actual {vetiver}, using {lambdr}, so I can grok how the basics of predict on lambda will work. In theory we should be able to use that as a direct comparison for the workflow you're putting together.

jonthegeek commented 2 years ago

Check out https://github.com/rstudio/pins-r/issues/611 if you haven't seen that! I beat my head against that a bit today. We should presumably go to S3 preferably from Lambda, so maybe there's a better option than using {pins}, but it might make things complicated.

mdneuzerling commented 2 years ago

Oof, that may be an issue. A Lambda instance in a container can only write to the /tmp directory. Thanks for the heads up about this.

I think pins and vetiver are meant to work with one another, but I agree that someone deploying a model on Lambda is probably storing their model on S3 too.

mdneuzerling commented 2 years ago

It's not quite there yet, but there should be enough material for you to get started, @jonthegeek: https://github.com/mdneuzerling/vetiver-r/tree/lambda

I'm using the below repo as a test for Vetiver. At the moment I'm struggling to build the Dockerfile because renv isn't picking up on a knitr dependency, but that will be solvable I'm sure. Consider the deploy.R file for defining the vetiver object, and creating the Lambda runtime.R and Dockerfile. https://github.com/mdneuzerling/simpsons-vetiver

I'm using the following build command. I've created an ECR repository by this point, so if you plan to follow along you'll need to do the same and use the URI below:

docker build --platform amd64 -t <aws_account_number>.dkr.ecr.<region>.amazonaws.com/<repository>:latest .
mdneuzerling commented 2 years ago

After a few more changes I can build the Dockerfile. I've been able to deploy the Lambda and I'm now hitting the pins error as above. I'll look into that next.

I noticed that the individual boards have package dependencies which may need to be captured when writing the renv lockfile. For lambda I just hardcoded a paws.storage dependency.

jonthegeek commented 2 years ago

@mdneuzerling i haven't had a chance to play since Friday, but I'd try a fork of pins without the onload to see what happens. If we can hard-code past that we'll know what needs to change.

juliasilge commented 2 years ago

For now in my demos, I am manually adding the required pins packages as well. I'll need to look into what is needed to make that work in the long term.

jonthegeek commented 2 years ago

Woot!

success

(I installed your dev version and updated things to hit the ::: versions of your functions, and I had to manually tell it to install knitr for some reason that I haven't fully diagnosed yet... but it worked with the demo function from https://vetiver.rstudio.com/get-started/deploy.html )

mdneuzerling commented 2 years ago

I’ll write up the docstrings and unit tests before submitting a PR. I’d also like to make it so that lambdr_predict wraps handler_predict. Otherwise we’re going to be doubling up on a lot of code.

juliasilge commented 2 years ago

I'm working to see if I can get this at least running myself and have a question that I hope one of you can help me with; I haven't worked with AWS Lambda much.

I'm running into problems getting Lambda to have access to a pin on S3.

Let's say I have pinned to an S3 bucket called "pins-testing-julia". Locally, I would use something like this to access it:

## `my-sso-profile` is in ~/.aws/config:
b <- board_s3(bucket = "pins-testing-julia", profile = "my-sso-profile", region = "us-east-2")

What do I do so that the pin is accessible to Lambda? I have a runtime.R that currently looks like this:

Sys.setenv(PINS_USE_CACHE = "true")

library(pins)
library(lambdr)

get_pin_contents <- function(name) {
  b <- board_s3(bucket = "pins-testing-julia", region = "us-east-2")
  pin_read(b, name)
}

start_lambda()

I made a Dockerfile with CMD ["get_pin_contents"], etc, and created a Lambda function. I gave it the s3:GetObject permission for arn:aws:s3:::pins-testing-julia/* (I also tried bumping up the S3 permission to, say, all read and write). If I do a call with "name": "name-of-my-pin-that-already-exists", then the call fails, without much helpful in the logs. (I do see the warning messages about In normalizePath("~") : that indicate it at least loaded the pins library.)

jonthegeek commented 2 years ago

@juliasilge Hmm. I assumed it was a permission thing, but you did that step. I'm technically doing mine via an IAM role that has the Lambda basics + a wide set of S3 read permissions, but I'm pretty sure you're doing something equivalent.

Can you share the full error message when you do a test (redacting anything that needs to be redacted but as far as I can remember there's nothing private there)? It might not be much but it'd be helpful to see what it DOES do.

mdneuzerling commented 2 years ago

@juliasilge I was able to get something almost identical working: https://github.com/mdneuzerling/test-get-pins

After building the image, pushing to ECR, and creating the Lambda function I made the following changes in the AWS web GUI:

Running the test: {"name":"penguins"} through either the AWS console or directly invoking the Lambda through the AWS CLI returned the best data set ever.

As for the PR, I'm pretty happy with the code, but now I need to take care of the tests and documentation. That's a huge task.

juliasilge commented 2 years ago

AH OK I got this to work! 🎉

Before, I had been trying to add a second permission policy to the execution role but this did not work for me. Instead, I had to edit the existing permission policy, to add read and list for the S3 bucket to the default Lambda execution role.

Thank you both so much! 🙌

@mdneuzerling If you are interested in opening a draft PR soon, I don't mind starting to test it and writing some documentation. No rush, but whenever you are ready!

mdneuzerling commented 1 year ago

Apologies folks, but I very rarely code in my free time these days and I feel guilty for leaving an issue open on such an important repository. Given that this has stagnated for over a year I think I will close this.

juliasilge commented 1 year ago

No worries at all @mdneuzerling! For my own context setting, do you plan to continue maintaining lambdr, for example if folks report bugs and all that? Or should we think about that as not actively maintained? Because I believe this vetiver support was pretty close to working and we could possibly finish it off, especially if other folks are interested in using it.

mdneuzerling commented 1 year ago

I hope to keep lambdr on CRAN (and I've just submitted a patch to satisfy their new documentation requirements).

I did make some progress, although the vetiver package has changed a lot since then: https://github.com/rstudio/vetiver-r/compare/main...mdneuzerling:vetiver-r:lambda