jordaneremieff / mangum

AWS Lambda support for ASGI applications
https://mangum.io/
MIT License
1.67k stars 119 forks source link

Question: How does one keep state between AWS Lambda cold starts? #295

Closed pkucmus closed 10 months ago

pkucmus commented 1 year ago

Hi, I hope this is the right place for questions.

To work on an example, I have an app that needs to fetch a JWKS from another remote, where would I store that so I don't have to fetch the JWKS on each request? Is this a place for a global initializer in the lifespan on startup? Is there another "trick" or am I bound to a DynamoDB or something like ElastiCache - anything that is a separate process from my Mangum Lambdas to keep a state. A related question would be: where would one create a Postgres connection pool (where I understand PSQL is likely a bad idea as lambdas would exhaust the available connections quickly), if I initialize the connection pool like encode/databases encourages to do, would the connections be persisted somehow? Maybe in the scope of one cold start? Am I completely off?

tahayk commented 1 year ago

Hey @pkucmus I'm new to Mangum but I'm going to try to answer your question from my point of view, AWS Lambdas are ephemeral, and will exist only when the application is invoked, this have 2 results:

  1. the storage is temporary, cannot be shared between lambdas, so you ll have to use some other services like DynamoDB.
  2. probably you don't need a connection pool for Postgres here, but in case you need a connection pool, you can create it manually in the application's script, but make sure to use the smallest possible connection pool, otherwise you ll reach the limit of pg-open-connection.

In case your JWKS doesn't change too much, you can set them as env-vars to the AWS Lambda function, or save them in Parameter Store. Hope this answer helps you, as I said, this is from my point of view, probably someone else has a better idea about this case.

pkit commented 10 months ago

@pkucmus All the init should be done in in your main.py (or whatever you call it) On "warm" restarts, the once initialized data in main will remain there (good case for a cache, a connection pool or any other ephemeral state) On "cold" restarts the main.py will be run again and all the state will be reinitialized. In case you want to save a "permanent" state, the best option is Dynamo. As it's fast and persistent. And doesn't cost exorbitant amounts of money, like AWS hosted Redis. Secrets can be stored in SSM parameter store, and it's also a pretty fast fetch.

pkucmus commented 10 months ago

@pkit thanks, so in conclusion if I have code like this:


something_to_keep = fetch_the_something()

def handler(event, context):  
    use_the_something(something_to_keep)

if handler is my lamdba handler only the code in handler will be invoked on a warm start? Once something done outside of the main handler it will remain in it's state between cold starts, right?

Sounds like global could be useful here in some cases? Like:


something_to_keep = None

def handler(event, context):  
    global something_to_keep
    if something_to_keep is None:
        something_to_keep = fetch_the_something()

    use_the_something(something_to_keep)

not that the example is good, but the above should be possible and should keep the state of something_to_keep between cold starts?

pkit commented 10 months ago

@pkucmus

if handler is my lamdba handler only the code in handler will be invoked on a warm start? Once something done outside of the main handler it will remain in it's state between cold starts, right?

Yes and yes

Sounds like global could be useful here in some cases?

Yup you can use global you can also use imports. Like:

some.py

something = fetch_the_something()

main.py (or anywhere else)

from some import something

Imports in python are imported only once. So it's guaranteed to run once even if you import it from multiple places.

pkucmus commented 10 months ago

This explains it, thank you @pkit!