upstash / ratelimit-js

Rate limiting library for serverless runtimes
https://ratelimit-with-vercel-kv.vercel.app
MIT License
1.65k stars 33 forks source link

Enabling analytics leads to timeouts in response-streaming AWS Lambda #91

Closed unstubbable closed 4 months ago

unstubbable commented 6 months ago

Follow-up from #89. Turns out, the analytics are the cause for the task timeouts, not the pending promise.

Additional info: The affected Lambda functions do use response streaming. ~I haven't tested classic Lambdas with analytics yet.~ (Update: Only response-streaming Lambdas are affected.) The response stream is ended properly, but the task does not complete and runs into a timeout.

A related PR is #91. This interfered with my observations because I wasn't even aware that I was switching from enabled to disabled analytics in one of my changes.

ogzhanolguncu commented 6 months ago

It looks like you've found a temporary fix for the issue. If an immediate fix for analytics isn't necessary, I'll postpone addressing this issue for now since we're currently short on resources. Additionally, it would be helpful if you could share a sample code to make it easier to fix in the future.

unstubbable commented 6 months ago

Here's a repro: https://github.com/unstubbable/upstash-ratelimit-analytics-lambda-timeout

ogzhanolguncu commented 6 months ago

Hey, @unstubbable, I haven't found the underlying issue, but I'm still trying to figure out.

CahidArda commented 5 months ago

Hi @unstubbable,

I tried your lambda and saw that the requests timeout like you said. Long story short, I found a solution but didn't figure out why this occurs in the first place.

I will make sure to add an example for streaming lambda or update our docs.

Solution

But we still get timeout. I tried to find if there were other people who experienced similar issues and found some posts [1, 2, 3, 4].

What I kept seeing was the callbackWaitsForEmptyEventLoop parameter in the Lambda nodejs context. I set the parameter to false in the handler you provided and the timeouts dissappeared.

import {Ratelimit} from '@upstash/ratelimit';
import {Redis} from '@upstash/redis';
import type {APIGatewayProxyEventV2} from 'aws-lambda';

export const handler = awslambda.streamifyResponse<APIGatewayProxyEventV2>(
  async (event, responseStream, context) => {
    console.log(JSON.stringify({event}));

    context.callbackWaitsForEmptyEventLoop = false
    // ...

I looked into why we need to set this parameter to false in our case but I couldn't figure that out yet. After this change, there is no more timeouts.

Checking Latency

After verify the solution, I checked the time it takes for the requests to complete. The request time reduced as we expect after the fix.

Pre Fix ``` --- Successful Requests Statistics --- Mean Response Time: 2.65 seconds Minimum Response Time: 2.469 seconds 1st Quantile Response Time: 2.571 seconds Median Response Time: 2.631 seconds 3rd Quantile Response Time: 2.666 seconds Maximum Response Time: 3.271 seconds --- Rate-Limited Requests Statistics --- Mean Response Time: 2.63 seconds Minimum Response Time: 2.508 seconds 1st Quantile Response Time: 2.563 seconds Median Response Time: 2.638 seconds 3rd Quantile Response Time: 2.694 seconds Maximum Response Time: 2.766 seconds --- Summary --- Total Successful Requests: 41 Total Rate-Limited Requests: 19 ```
Post Fix ``` --- Successful Requests Statistics --- Mean Response Time: 0.456 seconds Minimum Response Time: 0.306 seconds 1st Quantile Response Time: 0.309 seconds Median Response Time: 0.316 seconds 3rd Quantile Response Time: 0.425 seconds Maximum Response Time: 1.353 seconds --- Rate-Limited Requests Statistics --- Mean Response Time: 0.31 seconds Minimum Response Time: 0.289 seconds 1st Quantile Response Time: 0.299 seconds Median Response Time: 0.301 seconds 3rd Quantile Response Time: 0.308 seconds Maximum Response Time: 0.604 seconds --- Summary --- Total Successful Requests: 10 Total Rate-Limited Requests: 50 ```
ogzhanolguncu commented 5 months ago

Btw, I don't know for sure, there might be a floating promise in our core analytics packages. Thats why event loop waits until the end to execute the floating promises and sometimes not even executing them. That might be worth checking.

CahidArda commented 5 months ago

Hi again @unstubbable,

We identified the reason why we had to set the callbackWaitsForEmptyEventLoop parameter to false.

In the core-analytics library, there is a cache which calls setInterval but clearInterval is never called. We are making changes to disable the cache in rate limit library and only enable it in the analytics dashboard in Upstash console which is the original purpose of the cache.

After this change, lambda timeouts stopped without having to set callbackWaitsForEmptyEventLoop to false.

Thanks for bringing this issue to our attention!

CahidArda commented 4 months ago

Closing this issue since the changes were merged and version was bumped. Thanks again @unstubbable.