Closed ryan-mars closed 3 years ago
Thoughts on key sharding for event store in DynamodDB. Access patterns sorted by frequency.
Given a service "Operations" with an aggregate named "Flight" with an id of "PA576" and an event of type "FlightDeparted"...
{
id: "1s6DrF0P9U6z1cQplUINwLlyQG3", // KSUID
time: "2021-05-05T03:12:22.000Z",
source: "Flight",
source_id: "PA576",
bounded_context: "Operations",
event_type: "FlightDeparted",
payload: {
from: "SFO",
departed_at: "2021-05-04T12:30:00-07:00"
}
}
All events for a single aggregate id, in order sk BEGINS_WITH "EVENT"
pk: Flight#PA576 // source#source_id
sk: EVENT#1s6DrF0P9U6z1cQplUINwLlyQG3 // EVENT#id
Latest snapshot sk EQ "SNAPSHOT"
pk: Flight#PA576 // source#source_id
sk: SNAPSHOT
All events for an aggregate type within specified period
gsi1pk: Flight // source
gsi1sk: 1s6DrF0P9U6z1cQplUINwLlyQG3 // id KSUID from "2021-05-05T03:12:22.000Z"
KeyConditionExpression="#p = :p AND #s BETWEEN :start and :end"
ExpressionAttributeNames={
"#p": "gsi1pk",
"#s": "gsi1sk"
},
ExpressionAttributeValues={
":p": { "S": "Flight" },
":start": { "S": "1s6DoW0E203SU12dTtKVGOiZtgD" }, // KSUID from "2021-05-05T03:12:00.000Z"
":end": { "S": "1s6Dw1V4QI7MR5CuC2SDOzw6d7g" } // KSUID from "2021-05-05T03:13:00.000Z"
}
🤔
What's a snapshot?
All events for an aggregate type within specified period
gsi1pk: Flight
gsi1sk: 1s6DrF0P9U6z1cQplUINwLlyQG3
Does this create a hot partition in dynamo? All flight events will be in one dynamo partition.
gsi1pk: Flight gsi1sk: 1s6DrF0P9U6z1cQplUINwLlyQG3
Does this create a hot partition in dynamo? All flight events will be in one dynamo partition.
Yes 🤦🏻♂️ I wasn't thinking. Maybe it could be randomly key sharded depending on volume. For instance Flight#01 - Flight#20 (up to the BatchGetItem max of 100)
What's a snapshot?
A snapshot is saved state of the aggregate. It should only be used when old events must be deleted for data privacy reasons or when an aggregate has so many events that it is affecting write (command) performance.
Snapshots are added "as of" an event #. For our purposes the aggregate reducer would take the latest snapshot (if one exists) as initialValue
and process all subsequent events.
Snapshots are not necessary for the demo milestone.
Event replay should be the least frequent of all access patterns. Perhaps it would be better off done from S3 or EFS where it's easy to name files with a sortable event ID. See the "Going Plaid in S3" section of this article.