grpc / grpc-node

gRPC for Node.js
https://grpc.io
Apache License 2.0
4.45k stars 644 forks source link

gRPC Keepalive Behavior in Serverless Environments #2664

Open pratik151192 opened 7 months ago

pratik151192 commented 7 months ago

(This ticket is more for documentation and educational purposes; and we aren't requesting a bug fix or new feature per se)

Background:

When deploying gRPC services on serverless platforms such as AWS Lambda (and potentially Google/Azure Cloud Functions), we have observed a specific behavior related to gRPC keepalives that can lead to unexpected timeouts. Serverless platforms often reuse containers/instances for multiple invocations of the same function, with invocations guaranteed to be sequential. There can be variable pauses between these invocations depending on the incoming traffic to a container.

Issue:

We have noticed that gRPC keepalive pings can timeout in these serverless environments. Specifically, the issue manifests as follows:

keepalive | (4) 54.xxx.xxx.xx:443 Ping timeout passed without response

This behavior appears to be linked to the unique operational dynamics of serverless platforms, where the idle time between function invocations does not align with the expected keepalive intervals. The primary goal of this ticket is to update any relevant documentation that can provide clarity and guidance to developers deploying gRPC services in serverless environments.

pratik151192 commented 5 months ago

We also posted a blog regarding this https://www.gomomento.com/blog/a-tale-of-grpc-keepalives-in-the-lambda-execution-context