stojanovic / random

4 stars 1 forks source link

Lambda Destinations for synchronous invocations #1

Open stojanovic opened 4 years ago

stojanovic commented 4 years ago

Serverless has two killer features:

  1. Rewards good practices, better than any other architecture before. For example, if you optimize and refactor your app to run faster, you'll be rewarded with cheaper infrastructure. That makes refactoring decision easy, as it can be backed with numbers.
  2. Promises to help us to shift our focus to our business logic. For example, users of my leave-tracking app don't really care if we store our data in MySQL or DynamoDB, or if our Stripe webhooks are delivered to the server or the Lambda function, as long as our system works reliably and don't compromise their data.

I think AWS is heading in that direction, but I also agree with Joe Emisson (as almost always), when he says:

AWS really needs someone at the VP level (like @adrianco has done for open source) to influence (but not dictate) easier, more-opinionated ways of using the services for the 80% use case (e.g., CRUD apps).

And his thread in general.

Problems we are facing in production

I'll try to explain the way Lambda destinations for synchronous invocations can help us move a step toward that promise of serverless, with a few practical examples:

These cases are not the only cases where Lambda Destinations can simplify the way we build event-driven apps; they are the most obvious problems we are facing while building our startup using serverless technologies.

Note: This is just my opinion, and there might be a better way to solve the problems mentioned below.

Why Lambda Destinations can help

Lambda Destinations makes handling asynchronous invocations a lot easier, minimizing the amount of code required to manage our serverless apps.

Serverless applications (or "cloud-native" apps in general) are often fully event-driven, but there are certain things that still need to stay synchronous. For example, when a user tries to sign up, you either need to wait for the process to be done and return the result, or set up complex WebSockets/long polling connection and notify the user when the process is done. Another typical example is webhooks; if some third party sends you the webhook, you need to respond as fast as possible and still be able to do some processing of the received payload.

Having Lambda Destinations for synchronous invocations would make some of these scenarios easier, reduce the amount of our backend code, and help us to decouple business logic from everything else without making the architecture over complicated.

Webhooks

In our app - Vacation. Tracker, we have two types of webhooks: important ones that need to be processed 100% of the time, and less critical webhooks that will not affect our users even if we miss some of them.

Critical webhooks

For example, payment (Stripe) webhooks are way more critical than webhooks we receive from Slack when the user changes their profile image.

The critical webhook can look like the following diagram:

important-webhook-v1

From the figure above, our webhook has two problems:

  1. The cost of the app if you are receiving many webhook events. Not a huge issue, as API Gateway is not that expensive, but we don't use any of the API Gateway power features in this scenario.
  2. We are maintaining one more Lambda function per webhook. Why do we need multiple functions? Because Stripe, Slack and similar systems often require your answer within 3 seconds. When you include latency and communication with multiple services, you can easily miss that 3 seconds window, which will cause retries and potential issues in your system.

A better webhook solution would include VTL and would look like the following diagram:

important-webhook-v2

Instead of having multiple Lambda functions, we "should" use a VTL template to connect the API Gateway request directly to the SQS queue.

This scenario removes part of our code but makes our app a lot harder to test and maintain. Also, it makes our CloudFormation/SAM template less readable and larger.

Cost of our solution can be significantly reduced by migrating to the new HTTP APIs for Amazon API Gateway. However, HTTP API doesn't not support VTL templates, and that takes us back to the initial problem.

Less important webhooks

Some webhooks are not as crucial as the webhooks described in the previous section. For example, Slack can send the info that the user changed their profile picture. Even if our system does not process that info, or if it processes it multiple times, our users will not be significantly impacted.

A common less important webhook can be similar to the following diagram:

less-important-webhook

It looks the same as the webhook from the previous section; the only significant differences are:

But, as this section title says, this type of webhook is less critical for our apps.

The Lambda Destinations Solution

If the cost is the only issue, which is not because it's way lower than the cost of the traditional system, Application Load Balancer would be a solution. However, the Application Load Balancer setup is more complicated than API Gateway's setup.

But Lambda destinations would make our app significantly more straightforward. Consider the following diagram:

webhook-with-lambda-destinnations

This architecture looks similar to the previous diagrams, but there's one big difference.

Lambda function in the second step does not contain any business logic or transformations. It simply returns a status 204 and an empty body. The code of this function can be similar to the following code snippet:

export async function handler() {
  return {
    statusCode: 204,
    body: null
  }
}

Lambda destinations will automatically provide both request and response as an SQS message in the queue (see step 3). That queue will then trigger another Lambda function that will contain our business logic, and that will be fully covered with tests.

Lambda destinations for synchronous invocations would solve the following issues:

Processing analytics and other metadata

Another problem we are facing, a common problem for web applications, is tracking analytics and metadata without coupling that code with the business logic.

For example, we use Segment to track specific metrics, then we pass that data to different tools that track conversion, send emails or give us data that helps us make decisions.

Let's say that we want to test a new feature for our leave tracking app. The feature allows managers to reward people with additional leave days, per our clients' request, but we want to track this feature and be sure that our users are using it.

This feature can look similar to the following diagram:

process-analytics-data

The problem with the diagram above is our Lambda function that grows with time, not because we added more business logic, but because we introduced additional metrics. Adding metrics make our code harder to maintain and test. This is expected for multiple external dependencies, but this is not the price end-users should pay. The system should still work fast for them. An increasing number of external dependencies should affect only us, as developers or owners of the application, by increasing our maintenance and infrastructure cost.

Also, if we use some of the third-party apps directly from our Lambda function, it can timeout if an external dependency has issues.

Finally, it makes our code harder to abstract. For example, if we want to publish a similar app to SAR, I would need to expose the queue and dead letter queue to let users do additional actions on success or failure.

The Lambda Destinations Solution

The Lambda destinations would simplify this case. For example, consider the following diagram:

process-analytics-data-with-lambda-destinations

Architecture seems similar to the previous one, with one big difference -- our business logic is completely isolated from the analytics code.

Our Lambda function saves new data to the database, and Lambda destinations trigger an SNS queue for processing analytics if the request succeeded.

It also allows us to handle errors. For example, if I need to send or get some info from Slack, and Slack is down, I can retry later and let the client know that the request will be processed in the background, without changing our business logic.

Lambda destinations for synchronous invocations would solve the following issues:

Event-driven apps and event sourcing

Serverless applications are event-driven by their nature. However, there are multiple types of event-driven applications. For example, Martin Fowler divides event-driven apps into the following four categories: event notification, event-carried state transfer, event-sourcing, and CQRS.

All four categories work well with serverless apps. However, if you want to pass the state of your event, you'll need to add custom logic to your code.

Let's say that I want to track all of our leave requests in a database that works similar to the append-only log. I want to track all the changes for each request. For example, I want to track when the user submitted the request, when and who approved the request. Sometimes teams can change their settings and enable multi-level approvals, which can affect our approval process, so I need to be able to recalculate state whenever settings are changed.

If I am using the database that can stream changes directly to Lambda function, my architecture can look similar to the one from the following diagram:

event-driven-app-v1

This diagram shows the following:

  1. A user sends a leave request.
  2. An API Gateway event triggers a Lambda function that stores a request to the DynamoDB.
  3. DynamoDB stream triggers another Lambda function that does all the background processes, such as:

To improve this architecture, we can decouple the logic for various notification channels, and do a background processing for analytics and snapshots.

However, this architecture becomes way more complicated when you are not using the database that has the capability to trigger a Lambda function. For example, if you are using some other database, this architecture can look similar to the following diagram:

event-driven-app-v2

This architecture will work well, as the logic is slightly decoupled, but each step requires additional error handling. Also, this example is ignoring VPC set up, and the current lack of EventBridge debugging process.

Cleaner solution using the Lambda Destinations

The last architecture from the previous section is not that bad, but Lambda Destinations for synchronous invocations would make event-driven architecture natural for serverless apps.

Here's how the similar architecture can look like with Lambda Destinations:

event-driven-app-with-event-destinations

Again, this architecture looks similar to the previous one. Still, with one significant improvement: only the business logic stays synchronous, and everything else happens in the background without any complicated code modifications. So, the user will wait until the event is validated and recorded in the database. The event will be sent to the EventBridge or a queue in the background.

Lambda destinations for synchronous invocations would solve the following issues:

jeremydaly commented 4 years ago

Great stuff, @stojanovic!

+1,000,000,000 for synchronous Lambda Destinations!

alexdebrie commented 4 years ago

Nice writeup, @stojanovic . Some great recommendations here.

My only nit -- in the first use case, it still seems weird to even have to use an empty Lambda. I wish APIGW (or the new HTTP API) had easier, non-VTL ways to send events to other systems like SNS, SQS, or Kinesis. For the first two, just let me publish a message that is the exact format I'd get in my Lambda function!

stojanovic commented 4 years ago

@alexdebrie agree, but that probably requires more work for AWS. Ideally, Lambda Destinations should be just Destinations and work with Lambda, API Gateway and few other products.

omichowdhury commented 4 years ago

Another use case is for lambdas that are triggered from an SQS queue - Lambda invokes those function synchronously. I understand why (the batching and retry is super nice) but this limitation means there's an annoying disconnect when building event-driven systems.

Would be awesome to have a built-in EventBridge -> SQS -> Lambda -> EventBridge "async processing unit" that can be used for each step in a pipeline.