pynamodb / PynamoDB

A pythonic interface to Amazon's DynamoDB
http://pynamodb.readthedocs.io
MIT License
2.44k stars 427 forks source link

Suggestion from AWS Veteran and ORM Framework Author #1238

Open MacHu-GWU opened 4 months ago

MacHu-GWU commented 4 months ago

I have noticed a significant design flaw in PynamoDB. In most ORM frameworks, such as SQLAlchemy and MongoEngine, users are allowed to define a connection object and use a context manager to choose which connection object to use in a code block. For example, in SQLAlchemy, it looks like:

import sqlalchemy as sa
engine1 = sa.create_engine(...)
engine2 = sa.create_engine(...)
with engine1.connect() as conn:
conn.execute(...)

Additionally, these frameworks usually provide a feature to set a global connection object, eliminating the need to explicitly specify the connection.

In PynamoDB, there are three API levels: Connection, TableConnection, and Model. The Model class (not instance) has a private attribute called _connection, which creates the connection object when an API that requires an AWS API call is used, and it remains there indefinitely.

This design leads to an issue where users may switch the default AWS profile during runtime (for example, when testing on different AWS accounts or performing actions using a mock before switching to a real AWS account). Users might think they have switched the AWS profile, but PynamoDB will continue to use the cached connection and won't be able to switch AWS profiles. Below is an example:

from pynamodb.models import Model
from pynamodb.attributes import UnicodeAttribute, NumberAttribute
from pynamodb.connection import Connection
from pynamodb.constants import PAY_PER_REQUEST_BILLING_MODE
from boto_session_manager import BotoSesManager
from rich import print as rprint

class Item(Model):
    class Meta:
        table_name = "pynamodb_connection_example_key_value_items"
        region = "us-east-1"
        billing_mode = PAY_PER_REQUEST_BILLING_MODE

    key = UnicodeAttribute(hash_key=True)
    value = NumberAttribute()

bsm1 = BotoSesManager(profile_name="my_profile", region_name="us-east-1")
bsm2 = BotoSesManager(profile_name="my_profile", region_name="us-east-2")

# ------------------------------------------------------------------------------
# This won't work
# ------------------------------------------------------------------------------
with bsm1.awscli():
    conn = Connection()
    Item.Meta.region = bsm1.aws_region
    Item.create_table(wait=True) # expect to create table in us-east-1

with bsm2.awscli():
    conn = Connection()
    Item.Meta.region = bsm1.aws_region
    Item.create_table(wait=True)  # expect to create table in us-east-2

# ------------------------------------------------------------------------------
# This would work, but rely on the private attribute _connection
# ------------------------------------------------------------------------------
with bsm1.awscli():
    Item._connection = None
    Item.Meta.region = bsm1.aws_region
    conn = Connection()
    Item.create_table(wait=True)  # expect to create table in us-east-1

print("--- bsm2 ---")
with bsm2.awscli():
    Item._connection = None
    Item.Meta.region = bsm2.aws_region
    conn = Connection()
    Item.create_table(wait=True)  # expect to create table in us-east-2

Since a DynamoDB connection is a virtual concept, unlike an RDBMS connection, and is essentially a REST API rather than a long-living connection, I understand why PynamoDB is designed this way. However, for each API that requires an AWS API call, it should provide an optional parameter, such as a pynamodb.connection.Connection or boto3.session.Session object. If it is not provided, the existing logic should be used to determine the connection; otherwise, the specified connection should be used explicitly.

Furthermore, PynamoDB should provide a context manager that allows users to use pynamodb.connection.Connection or boto3.session.Session to override the current implicit connection. This aligns with the Python philosophy of "explicit is better than implicit." These two features are available in most ORM frameworks but are missing in PynamoDB.

Another issue I have found is that the connection object is a wrapper of the botocore client. I understand that the connection object aims to provide additional functionality, such as sending telemetry. However, this design limits the fine-grained control of the underlying boto3.session.Session or botocore.Client. For example, users cannot use auto-refreshable sessions. In my opinion, while the connection API is nice, users should not lose the capability to use the exact boto3.session.Session or botocore.Client. I acknowledge that this may require a more complex design to implement correctly, so I present this as a personal suggestion open for discussion.

Thank you for the great library.