No support for adding global secondary indices after initial table creation

chriskck commented 6 years ago

This is a feature request.

Currently, PynamoDb does not support creating a Global Secondary Index for an existing table after the initial table creation. Instead, I had to descend to the boto libraries to accomplish this.

I suggest prioritizing this feature up the list as live service operations demands features expansion which, in turn, requires expanding the tables.

shadycuz commented 6 years ago

Yeah I just ran into this as well.

ikonst commented 5 years ago

@chriskck It's a reasonable suggestion, but there's complexity involved. I'll explain.

For an existing table, creating a GSI can trigger a long backfill during which the GSI is naturally unavailable. I'm assuming your pattern is to deploy a new version of the service and, if any of its declared indices do not exist, to auto-create them. A naive user of this pattern might attempt to immediately take advantage of the index, but it might not be available. Therefore, the use of this pattern requires careful two-stage deploy, where you monitor the AWS status of the GSI before deploying the 2nd, functional part.

Furthermore, in a distributed system, it seems a little strange where all nodes are equally responsible for the creation of a shared resource (a table, a GSI).

Finally, would it imply we'd also delete existing GSIs that are no longer present in the model definition? This might be taking too many liberties with data, but if we don't do this, our API becomes somewhat asymmetrical.

At Lyft we've been using pynamodb's table creation only in dev environment (with dynamodb-local) where the tables are typically created from scratch. In production, we're using orchestration tools like Terraform, though I can see how, in a smaller operation, you'd just do production changes with aws dynamodb update-table.

chriskck commented 5 years ago

I see your point.

I'm not suggesting that the indices should be automatically created as part of the table creation. instead, I'm suggesting that the Index classes add similar create and destroy functions as the Models?

I feel this increases API consistency across all the table/model related classes (the Index is, after all, just another Model/Table albeit one where their records are 'slaves' to the parent table). This also preserves the operation symmetry and relies on the developers to call the function as required.

The one concern raised - that the Index creation can take awhile and is therefore unavailable for query - still remains. I don't think there's a good way to prevent that. In my opinion, the PynamoDb library is a helper library. Developers are still expected to understand the DynamoDb fundamentals.

ikonst commented 5 years ago

I feel this increases API consistency across all the table/model related classes (the Index is, after all, just another Model/Table albeit one where their records are 'slaves' to the parent table). This also preserves the operation symmetry and relies on the developers to call the function as required.

If I understand you correctly, MyModel.create_table would still create the table and all its indices, but also could call MyModel.my_index.create_index to add a specific index to the table? This sounds reasonable to me.

chriskck commented 5 years ago

Yes, though I was suggesting not changing the existing MyModel.create_table function. I feel it's acceptable if MyModel.create_table only creates the model table and not the indices as the indices are separate tables and the function implies creating just one table.

ikonst commented 5 years ago

create_table currently creates the indices. Not having it do that would be a breaking change.

chriskck commented 5 years ago

I'm sorry. You're correct. The issue is that there's no way to create an index after the initial table creation. The existing plan is perfect. Thanks!

tedivm commented 4 years ago

Bump- is there any advice for people who want to continue using pynamodb but may occasionally want to add new indexes? What's the best path forward here?

ikonst commented 4 years ago

@tedivm At this point I'd recommend provisioning the indexes through either AWS Console, awscli, Cloudformation, Terraform etc.

theFong commented 4 years ago

Here is a script I use to create global secondary indexes after tables are created that leverage pynamo definitions. Note that my tables are PAY_PER_REQUEST/On-demand so I do not specify ProvisionedThroughput in the GlobalSecondaryIndexUpdates object. This has only been tested with indices having the projection AllProjection()/ALL.

Plenty to improve! Would love to see how others do this. Boto3 update table docs

import typing
import boto3
import pynamodb.indexes
import pynamodb.models

# pass reference to the pynamodb table and global secondary index you wish to create
def create_gsi(
    pynamo_table: typing.Type[pynamodb.models.Model],
    pynamo_gsi: typing.Type[pynamodb.indexes.GlobalSecondaryIndex],
):
    table_meta = pynamo_table.Meta
    gsi_meta = pynamo_gsi.Meta

    client = boto3.client("dynamodb", region_name=table_meta.region)

    schema = pynamo_gsi._get_schema()
    key_schema = schema["key_schema"]
    attributue_def = [
        attribute_to_camel_case(a) for a in schema["attribute_definitions"]
    ]

    client.update_table(
        TableName=table_meta.table_name,
        AttributeDefinitions=attributue_def,
        GlobalSecondaryIndexUpdates=[
            {
                "Create": {
                    "IndexName": gsi_meta.index_name,
                    "KeySchema": key_schema,
                    "Projection": {
                        "ProjectionType": gsi_meta.projection.projection_type,
                        **get_nonkey_attributes(gsi_meta.projection),
                    },
                },
            }
        ],
    )

def attribute_to_camel_case(obj):
    return {
        "AttributeName": obj["attribute_name"],
        "AttributeType": obj["attribute_type"],
    }

def get_nonkey_attributes(projection: pynamodb.indexes.Projection):
    return (
        {"NonKeyAttributes": projection.non_key_attributes}
        if projection.non_key_attributes is not None
        else {}
    )

ikonst commented 4 years ago

Thanks @theFong. Generally speaking I agree it would be nice to add Model.update_table as this seems to be a common ask.

cyphx commented 2 years ago

@theFong , This was really helpful. There is just one caveat. The operation

client.update_table(
        TableName=table_meta.table_name,
        AttributeDefinitions=attributue_def,
        GlobalSecondaryIndexUpdates=[
            {
                "Create": {
                    "IndexName": gsi_meta.index_name,
                    "KeySchema": key_schema,
                    "Projection": {
                        "ProjectionType": gsi_meta.projection.projection_type,
                        **get_nonkey_attributes(gsi_meta.projection),
                    },
                },
            }
        ],
    )

gives "InternalError" when executed. I tried this on official dynamodb local docker image. The payload is missing a mandatory key, ProvisionedThroughput. But, I think this style will be really helpful to manage migrations. The updated method call looks like this

`client.update_table(
        TableName=table_meta.table_name,
        AttributeDefinitions=attributue_def,
        GlobalSecondaryIndexUpdates=[
            {
                "Create": {
                    "IndexName": gsi_meta.index_name,
                    "KeySchema": key_schema,
                    "Projection": {
                        "ProjectionType": gsi_meta.projection.projection_type,
                        **get_nonkey_attributes(gsi_meta.projection),
                    },
                    "ProvisionedThroughput": {
                        "ReadCapacityUnits": gsi_meta.read_capacity_units,
                        "WriteCapacityUnits": gsi_meta.write_capacity_units
                        }
                    },
                },
            }
        ],
    )`

pynamodb / PynamoDB

No support for adding global secondary indices after initial table creation #466