pynamodb / PynamoDB

A pythonic interface to Amazon's DynamoDB
http://pynamodb.readthedocs.io
MIT License
2.43k stars 427 forks source link

Question: avoiding Lambda cold starts #489

Open htowens opened 6 years ago

htowens commented 6 years ago

As you’re likely aware, container reuse in Lambda means that containers are “frozen” between executions, meaning that database connections can be reused to avoid a long startup time, leading to latency.

In order for a connection to be reused, the connection should be established outside the scope of the handler function in the Lambda code. In order to do this, I’ve simply used a low-level API call outside the handler function:

from pynamodb.connection import Connection dynamoDBConnection = Connection()

This should allow my connection to be reused across different executions, with only 1 cold start when the connection is first set up (a slight oversimplification, but works for the purposes of this question).

My question is: inside my handler code, how do I use the connection I have established outside the handler? The docs show examples for creating tables, etc. using the established connection, but not for typical operations like creating a record. For example, do I need to do:

user = UserModel(“John”, “Denver”) dynamoDBConnection.user.save()

My intention is clearly to ensure that pynamodb doesn’t attempt to open a new connection when I do user.save() without explicitly referencing the existing connection.

Am I missing something? If I have already declared the connection explicitly, does pynamodb somehow know to use this connection without me specifically referring to it when saving the record?

mikecee commented 6 years ago

You've declared a Connection; the pynamodb Model objects don't know how to use that particular Connection without you manually injecting it into each Model.

Under the hood, pynamodb Models use their own Connection, which (eventually) uses the wonderful Requests module. You get pooling and transparent connection management, attached to your model operations, for free.

If the real question here is "How can I warm up the connection manager to avoid some latency?" then one approach is to check that your Table exists e.g.

     # Trigger a connection
     if not UserModel.exists():
        # do something e.g. log message, create table, etc.
        # This is probably a good "safety" mechanism for most use cases

If you do this outside of your handler function, then the same pool should get used.

Note there's no guarantee that any pool connections to DynamoDB will remain established (e.g. AWS may drop the connection on you without warning). The best case is that there's a session ready and waiting when your handler is called, so there won't be any noticeable delay. This is no different to the thinking that AWS suggests regarding Lambda handler lifetime, that is, to assume single invocations -- everything else is a bonus.

Practically speaking, this is pretty easy to test for your particular application and Lambda memory size. Set up a single Lamba, include lots of logging (perhaps even enable X-ray to look at AWS services latency), and then write a driver loop to invoke, sleep for up to 20 minutes, rinse and repeat.

ricky-sb commented 4 years ago

Looks like a new connection is generated for every table.

So you would have to initialize all your tables' connections in a global context.