mdneuzerling / lambdr

Run R containers on AWS Lambda
https://lambdr.mdneuzerling.com
Other
131 stars 12 forks source link

Support for AWS Cognito (Lambda triggers) #9

Open mrismailt opened 2 years ago

mrismailt commented 2 years ago

I use AWS Cognito for Shiny authentication. As you may know, Cognito provides a set of triggers to extended Cognito features and these only work with Lambda functions. Can you add support for this?

mdneuzerling commented 2 years ago

Hi @mrismailt. I can look into providing some sort of simple deserialiser for handling Cognito input. Do you have an example JSON of the event body as it's passed to Lambda?

mrismailt commented 2 years ago

Here is an example for the pre-signup Cognito trigger:

https://docs.aws.amazon.com/cognito/latest/developerguide/user-pool-lambda-pre-sign-up.html

mrismailt commented 2 years ago

Do you have a vignette that walks through how to retrieve example JSON of various events? That would help people send through examples like you want

mdneuzerling commented 2 years ago

A vignette is a great idea! I can make that. The rough idea is to set up a function (any function) but run logger::log_threshold(logger::DEBUG) before start_lambda. Then set up the desired trigger that you want to test. The invocation will fail, but the raw event content will be logged, if logging is available for the Lambda.

I'll look into Cognito. I'm still working out exactly how much lambdr should help, but I think the key question is how consistent the event bodies are. If all Cognito events "look the same" then I can reliably deserialise the content. Otherwise, it may be up to the user to pass a custom deserialiser to the lambda_config to handle their own unique requirements.

** edit: changed setup_lambda to start_lambda

mrismailt commented 2 years ago

Do you mean before start_lambda? When I do that, the DEBUG logs show up but the content is empty for some reason:

image

mdneuzerling commented 2 years ago

That's bizarre! It didn't even log the request_id, which is in the same event header regardless of how the Lambda is invoked. And yet R must have received the request ID or else it would have given up on trying to handle the event. I wonder if, due to the sensitive nature of Cognito, there's some log censoring that's going on? It's not even showing the handler function on line 2.

I'm stumped. I don't know how easy it is to set up Cognito, but I'll see how far I can get.

mrismailt commented 2 years ago

The {cognitoR} package has an example app you can set up pretty quickly.

The way Cognito triggers work is that Amazon Cognito passes event information to your Lambda function which returns the same event object back to Amazon Cognito with any changes in the response.

Is there any reason why you don't simply have {lambdr} pass the complete JSON object (as an R list) to the R function being called in runtime.R and have the user use it as needed and construct and return the response as an R list? This way, the package would be setup for all possible Lambda use cases (if I'm not missing something)

mdneuzerling commented 2 years ago

The default behaviour is to assume that the body of the event is a JSON and deparse it into a list with jsonlite, as you say. I've implemented some different behaviour for API gateways and some other services, but other invocations will fall back to the default.

If the user wishes to handle the event content themselves, they can pass lambda_config(deserialiser = identity) to start_lambda. This way, the handler function will receive the raw JSON body, as a string. They would have to convert it into a list themselves.

mrismailt commented 2 years ago

But you seem to be removing the first layer and deparsing the contents. For example, the User Pool Lambda Trigger Event in Cognito passes the following object structure to Lambda:

{
    "version": "string",
    "triggerSource": "string",
    "region": AWSRegion,
    "userPoolId": "string",
    "userName": "string",
    "callerContext": 
        {
            "awsSdkVersion": "string",
            "clientId": "string"
        },
    "request":
        {
            "userAttributes": {
                "string": "string",
                ....
            }
        },
    "response": {}
}

Based on what you said above, I would expect {lambdr} to pass the following list as a single object to the runtime function:

list(version = "string", triggerSource = "string", region = AWSRegion, userPoolId = "string", userName = "string", callerContext = list(awsSdkVersion = "string", clientId = "string"), request = list(userAttributes = list(string = "string", ...)), response = list())

I would then be able to modify this single list object as needed and pass it back out. However, instead of a single list object, the runtime function receives 8 separate objects (version, triggerSource, region, etc.). So, instead of f <- function(event) {} I have to set the function up to receives the desparsed objects like so f <- function(version, triggerSource, region, userPoolId, userName, callerContext, request, response) {}

mdneuzerling commented 2 years ago

You're right, I misspoke. I need to improve the documentation around how events are deserialised into R objects.

The default behaviour is indeed to convert the JSON to a list, and then extract the arguments from that list to pass to the handler:

result <- do.call(config$handler, args = event_arguments)

This is suitable where you have a function like parity(number), where it would be tedious to have an additional step to extract the actual argument from list(number). My goal was to allow for "natural" R functions to be used as much as possible, especially for direct invocations. For cases where the trigger includes a lot of unneeded arguments, the handler could accept a ... to ignore them.

To get the behaviour you're talking about, where the handler receives a list of arguments rather than the separate arguments, I suppose you could pass a custom deserialiser function(x) list(jsonlite::fromJSON(x)).

There's a lot of variety in how Lambda works, and I had to make some decisions. It's difficult to cover every use case. I've tried to aim for a sensible default, along with some special logic for specific cases (API Gateways, etc.). I think that it's frustrating when you have to design your Lambda handler function to deal with Lambda itself, instead of the actual use case you want to tackle.

Failing that, there's always the option of a custom deserialiser so the user can decide how the event body is turned into an R object, or even passing deserialiser = identity and letting the handler function deal with it all.