RFC: Add support for namespacing partitions

sam3d commented 1 year ago

The problem

It's generally not recommended to put anything more into a partition than is needed for a single access pattern.

Let's imagine we have have the following entities and access patterns:

Entity	Primary access pattern	Resulting partition key
Organization	PK: `[organizationId]`	`$servicename#organizationid_23123`
User	PK: `[organizationId]`, SK: `[userId]`	`$servicename#organizationid_23123`
Session	PK: `[userId]`, SK: `[sessionId]`	`$servicename#userid_59283`

If there is a requirement to query both the data for an organization and the users in a single query, this works great natively in ElectroDB. However, if this isn't a requirement, then a single partition has multiple entity types when it didn't need them, putting unnecessary pressure on the partition.

ElectroDB currently offers three workarounds for dealing with this:

Use the template parameter - You can simply provide a custom template like the following for each $servicename#organization#organizationid_${organizationId} and $servicename#user#organizationid_${organizationId}. The downside to this is that it's verbose, prone to typos, requires you to rewrite the service name to keep it consistent with the entity, and just kind of "isn't the ElectroDB way of doing things".
Change the entity service - Because the entity has a service parameter that namespaces the partition key, this can effectively be repurposed for this. The downside of this is that it's scoped to every index, not just a single one. That means it's impossible to use this to keep data separate in one index, but then join it in another.
Add an extra attribute and use it in the index - Create an extra attribute in the entity called namespace, and have it set to a default value and make it readonly, and then use it in the index: $servicename#namespace_user#organizationid_23123. This does allow you to scope partitions on individual indexes, however it has the following issues:

It must be provided during any request that uses that composite key component on that index, even if it can only have one value: users.query.byOrg({ namespace: "user", organizationId: "23123" }).go(),
a separate attribute must be created for each namespace on each index required,
it doesn't follow the same convention as the collection property which groups entities within a single partition, and
it results in redundant data (present both in partition key AND as an attribute) where it isn't necessary in a sort key (a sort key collection doesn't need an attribute to capture it, but a PK "collection" does?).

The solution

To resolve this, I propose an extra optional parameter for an index. It should function exactly like collection except for partitions (at least as far as the entity def goes, it wouldn't need to add batch querying patterns to a service). Potential names for this field could be scope, namespace, partition, group, etc. I shall be referring to it as namespace for the purpose of this RFC.

export const organizations = new Entity({
  model: {
    entity: "organization",
    service: "platform",
    version: "1",
  },
  attributes: {
    organizationId: { type: "string", default: createId },
    name: { type: "string", required: true },
  },
  indexes: {
    byId: {
      namespace: "organization",
      // PK: $platform#organization#organizationid_293819
      pk: { field: "PK", composite: ["organizationId"] },
      sk: { field: "SK", composite: [] },
    },
    all: {
      index: "GSI1",
      namespace: ["organizations", "all"],
      // PK: $platform#organizations#all
      pk: { field: "GSI1PK", composite: [] },
      sk: { field: "GSI1SK", composite: ["name"] },
    },
  },
});

This is simply a transparent prefix to a partition key. It is static and no caller needs to know about it (similar to collection and service). It brings the partition key isolation in parity with the sort key, and allows for more advanced entity isolation patterns that are common and recommended natively.

tywalch commented 1 year ago

Hey @sam3d!

This is definitely the most thorough proposal I've gotten, thank you for the time you took to put this together!

This feature makes a lot of sense and would be a great add to the library. In the near term, this might be a heavier lift than I can manage right at the moment. There are a few areas that this would impact that would add more complexity, namely typing, that I believe will be a large effort.

I don't know exactly when, but I will keep this in mind when I next have more bandwidth. I have another feature (adding auto-sharding) that will likely hit similar places of complexity and I'll likely try to do both.

Let me know if I can help further, and definitely don't let this dissuade you from creating more tickets; my bandwidth is low at the moment, but I'm still actively maintaining it 👍

sam3d commented 1 year ago

Heya! 👋

That's absolutely no worries, I'm super grateful for the consideration nonetheless 😊

I hope you don't mind me asking – my understanding of the typing amendments required would be a fairly trivial update to the props on an entity index to include the namespace?: string | string[] | undefined in a standard way (i.e. not a type parameter). Simply treated as a static prefix that's prepended to the partition key.

It sounds like you're talking about going down a similar route to collections, where a partition namespace can actually refine the typescript typing of which entities a query may produce. Which would be very very cool! But a lot more work than what I was thinking of in my proposal, I imagine.

Thanks again for the awesome library! (p.s. very excited for auto-sharding)

tywalch commented 1 year ago

Good news, I slept on this (I read somewhere sleep helps...) an realized the typing won't be an issue here.

From a collections perspective, namespace is a property that can/must be verified at Service instantiation to match across all members. Since it impacts the pk key there is no reason to allow a collection with mixed namespaces.

So as far as my mental model is concerned I think this goes back to an easy add 🎉

sam3d commented 1 year ago

Yay! That makes sense, and is also pretty much what I had in mind too. Please let me know if there's anything I can do to support 🛟

tywalch commented 1 year ago

Worked on this this afternoon, made a PR. Let me know if you prefer any changes to your accreditation, this RFC was great and I wish I had been able to complete the work sooner.

sam3d commented 1 year ago

Whoa that's so exciting! Thank you so much for finding the time to work on this! 🚀

(p.s. really appreciate the accreditation, even though you did literally all of the heavy lifting)

thorhj commented 1 year ago

Thanks a lot @sam3d and @tywalch, this looks promising 🚀🥇

tywalch / electrodb

RFC: Add support for namespacing partitions #290

The problem

The solution