Closed mdneuzerling closed 1 year ago
This would be fantastic @mdneuzerling and we would love to work together with you to get this implemented. I would want to use renv for package management instead of install.packages()
to ensure the model has the right versions for predictions.
Absolutely. The Dockerfile will be very similar to the one for plumber
, except it will be a different parent. There's some additional complexities there, like installing R and using CentOS commands instead of Debian.
I'll put something together!
Have you made progress on this @mdneuzerling? I'd very much like to help/test!
Sorry, I've been held up with personal stuff (lots of corgis).
I've made a start on the Dockerfile work (https://github.com/mdneuzerling/vetiver-r/tree/lambda) and I expect to finish this over the weekend. However, this feature will be 20% code and 80% testing. If you have a Vetiver model handy then I would truly appreciate you trying to deploy when I'm done with the vetiver_write_lambda
function.
@mdneuzerling unfortunately my REAL use cases are going to need some other work before {vetiver} will be happy with them, I think... but right now I'm learning how things work, so I plan to prep a couple simple models that work with {vetiver} to use as part of my testing/learning process working toward my real models... which is all to say, ya, I think I can do that! My goal today is to deploy a model like the vetiver deploy example but without actual {vetiver}, using {lambdr}, so I can grok how the basics of predict on lambda will work. In theory we should be able to use that as a direct comparison for the workflow you're putting together.
Check out https://github.com/rstudio/pins-r/issues/611 if you haven't seen that! I beat my head against that a bit today. We should presumably go to S3 preferably from Lambda, so maybe there's a better option than using {pins}, but it might make things complicated.
Oof, that may be an issue. A Lambda instance in a container can only write to the /tmp
directory. Thanks for the heads up about this.
I think pins and vetiver are meant to work with one another, but I agree that someone deploying a model on Lambda is probably storing their model on S3 too.
It's not quite there yet, but there should be enough material for you to get started, @jonthegeek: https://github.com/mdneuzerling/vetiver-r/tree/lambda
I'm using the below repo as a test for Vetiver. At the moment I'm struggling to build the Dockerfile because renv
isn't picking up on a knitr
dependency, but that will be solvable I'm sure. Consider the deploy.R
file for defining the vetiver
object, and creating the Lambda runtime.R
and Dockerfile.
https://github.com/mdneuzerling/simpsons-vetiver
I'm using the following build command. I've created an ECR repository by this point, so if you plan to follow along you'll need to do the same and use the URI below:
docker build --platform amd64 -t <aws_account_number>.dkr.ecr.<region>.amazonaws.com/<repository>:latest .
After a few more changes I can build the Dockerfile. I've been able to deploy the Lambda and I'm now hitting the pins
error as above. I'll look into that next.
I noticed that the individual boards have package dependencies which may need to be captured when writing the renv lockfile. For lambda I just hardcoded a paws.storage
dependency.
@mdneuzerling i haven't had a chance to play since Friday, but I'd try a fork of pins without the onload to see what happens. If we can hard-code past that we'll know what needs to change.
For now in my demos, I am manually adding the required pins packages as well. I'll need to look into what is needed to make that work in the long term.
Woot!
(I installed your dev version and updated things to hit the :::
versions of your functions, and I had to manually tell it to install knitr
for some reason that I haven't fully diagnosed yet... but it worked with the demo function from https://vetiver.rstudio.com/get-started/deploy.html )
I’ll write up the docstrings and unit tests before submitting a PR. I’d also like to make it so that lambdr_predict
wraps handler_predict
. Otherwise we’re going to be doubling up on a lot of code.
I'm working to see if I can get this at least running myself and have a question that I hope one of you can help me with; I haven't worked with AWS Lambda much.
I'm running into problems getting Lambda to have access to a pin on S3.
Let's say I have pinned to an S3 bucket called "pins-testing-julia"
. Locally, I would use something like this to access it:
## `my-sso-profile` is in ~/.aws/config:
b <- board_s3(bucket = "pins-testing-julia", profile = "my-sso-profile", region = "us-east-2")
What do I do so that the pin is accessible to Lambda? I have a runtime.R
that currently looks like this:
Sys.setenv(PINS_USE_CACHE = "true")
library(pins)
library(lambdr)
get_pin_contents <- function(name) {
b <- board_s3(bucket = "pins-testing-julia", region = "us-east-2")
pin_read(b, name)
}
start_lambda()
I made a Dockerfile with CMD ["get_pin_contents"]
, etc, and created a Lambda function. I gave it the s3:GetObject
permission for arn:aws:s3:::pins-testing-julia/*
(I also tried bumping up the S3 permission to, say, all read and write). If I do a call with "name": "name-of-my-pin-that-already-exists"
, then the call fails, without much helpful in the logs. (I do see the warning messages about In normalizePath("~") :
that indicate it at least loaded the pins library.)
@juliasilge Hmm. I assumed it was a permission thing, but you did that step. I'm technically doing mine via an IAM role that has the Lambda basics + a wide set of S3 read permissions, but I'm pretty sure you're doing something equivalent.
Can you share the full error message when you do a test (redacting anything that needs to be redacted but as far as I can remember there's nothing private there)? It might not be much but it'd be helpful to see what it DOES do.
@juliasilge I was able to get something almost identical working: https://github.com/mdneuzerling/test-get-pins
After building the image, pushing to ECR, and creating the Lambda function I made the following changes in the AWS web GUI:
mdneuzerling
bucket. For some reason this had to be an inline policy, since attaching an existing policy didn't work (I have no idea why it needs to be inline. But could this be the issue?)Running the test: {"name":"penguins"}
through either the AWS console or directly invoking the Lambda through the AWS CLI returned the best data set ever.
As for the PR, I'm pretty happy with the code, but now I need to take care of the tests and documentation. That's a huge task.
AH OK I got this to work! 🎉
Before, I had been trying to add a second permission policy to the execution role but this did not work for me. Instead, I had to edit the existing permission policy, to add read and list for the S3 bucket to the default Lambda execution role.
Thank you both so much! 🙌
@mdneuzerling If you are interested in opening a draft PR soon, I don't mind starting to test it and writing some documentation. No rush, but whenever you are ready!
Apologies folks, but I very rarely code in my free time these days and I feel guilty for leaving an issue open on such an important repository. Given that this has stagnated for over a year I think I will close this.
No worries at all @mdneuzerling! For my own context setting, do you plan to continue maintaining lambdr, for example if folks report bugs and all that? Or should we think about that as not actively maintained? Because I believe this vetiver support was pretty close to working and we could possibly finish it off, especially if other folks are interested in using it.
I hope to keep lambdr
on CRAN (and I've just submitted a patch to satisfy their new documentation requirements).
I did make some progress, although the vetiver package has changed a lot since then: https://github.com/rstudio/vetiver-r/compare/main...mdneuzerling:vetiver-r:lambda
I'm following up on a tweet from a long time ago about implementing this. All the rstudio::conf tweets make me want to build something!
The end goal of this feature would be to simplify and automate the creation of a Dockerfile, the image of which can be deployed as an AWS Lambda function. Users could then access the model through any of the supported AWS integrations (including a HTTP API Gateway).
Like
vetiver_write_docker
, the goal is not to do the actual deployment and configuration of the endpoint, but to produce a Dockerfile that can be built into an image which is deployed:vetiver_write_plumber
vetiver_write_docker
vetiver_write_lambda_runtime
vetiver_write_lambda
If the maintainers are okay with this proposal I'm happy to implement this (with an unknown time frame!). The following changes would need to be made:
lambdr
as a "Suggests" dependency.vetiver_write_lambda_runtime
function with an API identical to that ofvetiver_write_plumber
. The generated runtime file would need to read the model pin, source packages, and then run some sort ofpredict
function.vetiver_write_dockerfile
function that would be used by bothvetiver_write_docker
andvetiver_write_lambda
. This is so we can have more flexible options for writing Dockerfiles, especially using different parent images. We're aiming for a Dockerfile like the one shown here.