AWS: Serverless runtime & CDK infrastructure

ryan-mars commented 3 years ago

[x] AWS CDK for infra
[x] AWS Lambda runtime implementation

ryan-mars commented 3 years ago

Thoughts on key sharding for event store in DynamodDB. Access patterns sorted by frequency.

Given a service "Operations" with an aggregate named "Flight" with an id of "PA576" and an event of type "FlightDeparted"...

{
  id: "1s6DrF0P9U6z1cQplUINwLlyQG3", // KSUID
  time: "2021-05-05T03:12:22.000Z",
  source: "Flight",
  source_id: "PA576",
  bounded_context: "Operations",
  event_type: "FlightDeparted",
  payload: {
    from: "SFO", 
    departed_at: "2021-05-04T12:30:00-07:00"
  }
}

All events for a single aggregate id, in order sk BEGINS_WITH "EVENT"

pk: Flight#PA576                       // source#source_id
sk: EVENT#1s6DrF0P9U6z1cQplUINwLlyQG3  // EVENT#id

Latest snapshot sk EQ "SNAPSHOT"

pk: Flight#PA576                     // source#source_id
sk: SNAPSHOT

All events for an aggregate type within specified period

gsi1pk: Flight                       // source 
gsi1sk: 1s6DrF0P9U6z1cQplUINwLlyQG3  // id  KSUID from "2021-05-05T03:12:22.000Z"

KeyConditionExpression="#p = :p AND #s BETWEEN :start and :end"
ExpressionAttributeNames={
    "#p": "gsi1pk",
    "#s": "gsi1sk" 
},
ExpressionAttributeValues={
    ":p": { "S": "Flight" },
    ":start": { "S": "1s6DoW0E203SU12dTtKVGOiZtgD" }, // KSUID from "2021-05-05T03:12:00.000Z"
    ":end": { "S": "1s6Dw1V4QI7MR5CuC2SDOzw6d7g" }    // KSUID from "2021-05-05T03:13:00.000Z"               
}

🤔

sam-goodwin commented 3 years ago

What's a snapshot?

sam-goodwin commented 3 years ago

All events for an aggregate type within specified period

gsi1pk: Flight                      
gsi1sk: 1s6DrF0P9U6z1cQplUINwLlyQG3

Does this create a hot partition in dynamo? All flight events will be in one dynamo partition.

ryan-mars commented 3 years ago

gsi1pk: Flight                      
gsi1sk: 1s6DrF0P9U6z1cQplUINwLlyQG3
Does this create a hot partition in dynamo? All flight events will be in one dynamo partition.

Yes 🤦🏻‍♂️ I wasn't thinking. Maybe it could be randomly key sharded depending on volume. For instance Flight#01 - Flight#20 (up to the BatchGetItem max of 100)

ryan-mars commented 3 years ago

What's a snapshot?

A snapshot is saved state of the aggregate. It should only be used when old events must be deleted for data privacy reasons or when an aggregate has so many events that it is affecting write (command) performance.

Snapshots are added "as of" an event #. For our purposes the aggregate reducer would take the latest snapshot (if one exists) as initialValue and process all subsequent events.

Snapshots are not necessary for the demo milestone.

ryan-mars commented 3 years ago

Event replay should be the least frequent of all access patterns. Perhaps it would be better off done from S3 or EFS where it's easy to name files with a sortable event ID. See the "Going Plaid in S3" section of this article.

ryan-mars / stochastic

AWS: Serverless runtime & CDK infrastructure #12