Open chriskck opened 6 years ago
Yeah I just ran into this as well.
@chriskck It's a reasonable suggestion, but there's complexity involved. I'll explain.
For an existing table, creating a GSI can trigger a long backfill during which the GSI is naturally unavailable. I'm assuming your pattern is to deploy a new version of the service and, if any of its declared indices do not exist, to auto-create them. A naive user of this pattern might attempt to immediately take advantage of the index, but it might not be available. Therefore, the use of this pattern requires careful two-stage deploy, where you monitor the AWS status of the GSI before deploying the 2nd, functional part.
Furthermore, in a distributed system, it seems a little strange where all nodes are equally responsible for the creation of a shared resource (a table, a GSI).
Finally, would it imply we'd also delete existing GSIs that are no longer present in the model definition? This might be taking too many liberties with data, but if we don't do this, our API becomes somewhat asymmetrical.
At Lyft we've been using pynamodb's table creation only in dev environment (with dynamodb-local) where the tables are typically created from scratch. In production, we're using orchestration tools like Terraform, though I can see how, in a smaller operation, you'd just do production changes with aws dynamodb update-table
.
I see your point.
I'm not suggesting that the indices should be automatically created as part of the table creation. instead, I'm suggesting that the Index classes add similar create and destroy functions as the Models?
I feel this increases API consistency across all the table/model related classes (the Index is, after all, just another Model/Table albeit one where their records are 'slaves' to the parent table). This also preserves the operation symmetry and relies on the developers to call the function as required.
The one concern raised - that the Index creation can take awhile and is therefore unavailable for query - still remains. I don't think there's a good way to prevent that. In my opinion, the PynamoDb library is a helper library. Developers are still expected to understand the DynamoDb fundamentals.
I feel this increases API consistency across all the table/model related classes (the Index is, after all, just another Model/Table albeit one where their records are 'slaves' to the parent table). This also preserves the operation symmetry and relies on the developers to call the function as required.
If I understand you correctly, MyModel.create_table
would still create the table and all its indices, but also could call MyModel.my_index.create_index
to add a specific index to the table? This sounds reasonable to me.
Yes, though I was suggesting not changing the existing MyModel.create_table
function. I feel it's acceptable if MyModel.create_table
only creates the model table and not the indices as the indices are separate tables and the function implies creating just one table.
create_table currently creates the indices. Not having it do that would be a breaking change.
I'm sorry. You're correct. The issue is that there's no way to create an index after the initial table creation. The existing plan is perfect. Thanks!
Bump- is there any advice for people who want to continue using pynamodb but may occasionally want to add new indexes? What's the best path forward here?
@tedivm At this point I'd recommend provisioning the indexes through either AWS Console, awscli, Cloudformation, Terraform etc.
Here is a script I use to create global secondary indexes after tables are created that leverage pynamo definitions. Note that my tables are PAY_PER_REQUEST
/On-demand so I do not specify ProvisionedThroughput
in the GlobalSecondaryIndexUpdates
object. This has only been tested with indices having the projection AllProjection()
/ALL
.
Plenty to improve! Would love to see how others do this. Boto3 update table docs
import typing
import boto3
import pynamodb.indexes
import pynamodb.models
# pass reference to the pynamodb table and global secondary index you wish to create
def create_gsi(
pynamo_table: typing.Type[pynamodb.models.Model],
pynamo_gsi: typing.Type[pynamodb.indexes.GlobalSecondaryIndex],
):
table_meta = pynamo_table.Meta
gsi_meta = pynamo_gsi.Meta
client = boto3.client("dynamodb", region_name=table_meta.region)
schema = pynamo_gsi._get_schema()
key_schema = schema["key_schema"]
attributue_def = [
attribute_to_camel_case(a) for a in schema["attribute_definitions"]
]
client.update_table(
TableName=table_meta.table_name,
AttributeDefinitions=attributue_def,
GlobalSecondaryIndexUpdates=[
{
"Create": {
"IndexName": gsi_meta.index_name,
"KeySchema": key_schema,
"Projection": {
"ProjectionType": gsi_meta.projection.projection_type,
**get_nonkey_attributes(gsi_meta.projection),
},
},
}
],
)
def attribute_to_camel_case(obj):
return {
"AttributeName": obj["attribute_name"],
"AttributeType": obj["attribute_type"],
}
def get_nonkey_attributes(projection: pynamodb.indexes.Projection):
return (
{"NonKeyAttributes": projection.non_key_attributes}
if projection.non_key_attributes is not None
else {}
)
Thanks @theFong. Generally speaking I agree it would be nice to add Model.update_table
as this seems to be a common ask.
@theFong , This was really helpful. There is just one caveat. The operation
client.update_table(
TableName=table_meta.table_name,
AttributeDefinitions=attributue_def,
GlobalSecondaryIndexUpdates=[
{
"Create": {
"IndexName": gsi_meta.index_name,
"KeySchema": key_schema,
"Projection": {
"ProjectionType": gsi_meta.projection.projection_type,
**get_nonkey_attributes(gsi_meta.projection),
},
},
}
],
)
gives "InternalError" when executed. I tried this on official dynamodb local docker image. The payload is missing a mandatory key, ProvisionedThroughput. But, I think this style will be really helpful to manage migrations. The updated method call looks like this
`client.update_table(
TableName=table_meta.table_name,
AttributeDefinitions=attributue_def,
GlobalSecondaryIndexUpdates=[
{
"Create": {
"IndexName": gsi_meta.index_name,
"KeySchema": key_schema,
"Projection": {
"ProjectionType": gsi_meta.projection.projection_type,
**get_nonkey_attributes(gsi_meta.projection),
},
"ProvisionedThroughput": {
"ReadCapacityUnits": gsi_meta.read_capacity_units,
"WriteCapacityUnits": gsi_meta.write_capacity_units
}
},
},
}
],
)`
This is a feature request.
Currently, PynamoDb does not support creating a Global Secondary Index for an existing table after the initial table creation. Instead, I had to descend to the boto libraries to accomplish this.
I suggest prioritizing this feature up the list as live service operations demands features expansion which, in turn, requires expanding the tables.