Index condition does only identify impacted pk on upsert and not on patch or update

tywalch / electrodb

A DynamoDB library to ease the use of modeling complex hierarchical relationships and implementing a Single Table Design while keeping your query code readable.

MIT License

956 stars 58 forks source link

Index condition does only identify impacted pk on upsert and not on patch or update #366

Closed nonken closed 2 months ago

nonken commented 3 months ago

Describe the bug I have the following index:

entriesByEffectiveDate: {
  index: 'gsi3pk-gsi3sk',
  condition: (attr) => attr.effectiveAt !== 'n/a',
  pk: {
    field: 'gsi3pk',
    composite: ['organizationId']
  },
  sk: {
    field: 'gsi3sk',
    composite: ['portfolioId', 'effectiveAt']
  }
}

When updating an item using .update the resulting DynamoDB query only has gsi3sk set and not gsi3pk. When making the same call using .upsert it works, but I need to provide all available fields which requires me to fetch the item first.

const { data, error } = await Entity.upsert({
  id,
  organizationId
}).set(obj)

I dug in a bit deeper and it looks like that _getIndexImpact will not return the pk in my case because organizationId is not provided in the attributes. The sk does get set resulting in a record without a pk.

It can be that I am not understanding the intend correctly, but my expectation was that this would work for update or patch calls.

ElectroDB Version 2.13.1

ElectroDB Playground Link Playground

Entity/Service Definitions Include your entity model (or a model that sufficiently recreates your issue) to help troubleshoot.

{
  model: {
    service: 'test',
    entity: 'test',
    version: '1'
  },
  attributes: {
    id: {
      type: 'string',
      required: true,
      readOnly: true
    },
    organizationId: {
      type: 'string',
      required: true,
      readOnly: true,
    },
    accountId: {
      type: 'string',
    },
    createdAt: {
      type: 'string',
      readOnly: true,
      required: true,
      default: () => new Date().toJSON(),
      set: () => new Date().toJSON()
    },
    updatedAt: {
      type: 'string',
      watch: '*',
      required: true,
      default: () => new Date().toJSON(),
      set: () => new Date().toJSON()
    },
    settledAt: {
      type: 'string',
      default: 'n/a'
    },
    effectiveAt: {
      type: 'string',
      default: 'n/a'
    },
    type: {
      type: 'string',
      required: true
    },
    category: {
      type: 'string',
      required: true
    },
    amount: {
      type: 'string',
      required: true
    },
    description: {
      type: 'string'
    }
  },
  indexes: {
    entries: {
      pk: {
        field: 'pk',
        composite: ['organizationId']
      },
      sk: {
        field: 'sk',
        composite: ['id']
      }
    },
    entriesByAccount: {
      index: 'gsi1pk-gsi1sk-index',
      pk: {
        field: 'gsi1pk',
        composite: ['organizationId']
      },
      sk: {
        field: 'gsi1sk',
        composite: ['accountId', 'id']
      }
    },
    entriesBySettledDate: {
      index: 'gsi2pk-gsi2sk-index',
      condition: (attr) => attr.settledAt !== 'n/a',
      pk: {
        field: 'gsi2pk',
        composite: ['organizationId']
      },
      sk: {
        field: 'gsi2sk',
        composite: ['accountId', 'settledAt']
      }
    },
    entriesByEffectiveDate: {
      index: 'gsi3pk-gsi3sk-index',
      condition: (attr) => attr.effectiveAt !== 'n/a',
      pk: {
        field: 'gsi3pk',
        composite: ['organizationId']
      },
      sk: {
        field: 'gsi3sk',
        composite: ['accountId', 'effectiveAt']
      }
    }
  }
}

Expected behavior I am unsure what the correct way of using condition here is.

sam3d commented 3 months ago

I've also noticed I'm having the same issue, I feel like this isn't the expected behaviour. The condition implies that it applies to the whole index, and the composite helper could be used to fill in the missing gaps for the other attribute

tywalch commented 3 months ago

Hi @nonken 👋

You and @sam3d are correct, this isn't the best behavior in this case. As mentioned in @sam3d's post you can use the composite method as a stop gap [example], but this is something that should be addressed. I will review this area of code and see what can be done to change this 👍

tywalch commented 2 months ago

@nonken

I dug into this today, and I have a fix incoming, but I wanted to bring up a logical issue with your example. It's possible this could be a shorted sided API consideration with the condition function.

The example update you provided looks like this:

entry
  .patch({ id: '123', organizationId: '123' })
  .set({ effectiveAt: 'n/a', accountId: "123", settledAt: "today" })
  .go();

The expectation above is that the patch() operation will write a pk and sk to the entriesBySettledDate index but not the entriesByEffectiveDate index. The user's goal is to keep In these indexes "sparse" unless the value equals "n/a".

In my fix this will be true, however the implementation of this strategy is not sound with the model provided. In its current form, the condition callback only determines whether or not an index will be set. What it won't do, currently, is "drop" an index if the condition fails.

For example (using your entity), if you create an item with effectiveAt: "today" then that index will be added. Later, if you perform an update set on that item with the value effectiveAt: "n/a", then ElectroDB will not write to that index. The problem, however, is that that index will still exist and it contain a stale value (effectiveAt: "today"). You would then get effectiveAt: "n/a" values back from that index 👎

@sam3d My gut is saying if an index condition returns false on update(), patch(), or upsert() ElectroDB should actually "drop" that index. Would you agree? I believe this is a pretty critical oversight on my part.

sam3d commented 2 months ago

@tywalch I think you're completely right. Similar to how a composite key is re-computed if any of its attributes change, the condition property behaves in a similar way.

What's interesting is that it's more akin to the watch property on attributes, where it technically only wants to re-compute if dependent attributes are modified. The composite keys can do this because their dependent attributes are known, but condition doesn't have the same benefit. If there's an index condition it would need to be checked on every operation against that entity.

There's an issue with this – without knowing dependent attributes or the prior state of the entity, during an update operation we don't know when the condition state transitions between false and true.

true -> false – This is actually OK (and what you're describing above). There is a bit of overhead in that we'll always be dropping the index's attributes, even if they were already not set. But that's probably more something for the documentation than it is an actual issue with the implementation.
false -> true – This one is harder. We don't know the previous state of the index, so we have to also set this index's attributes every time condition evaluates to true, even if we didn't touch any of those attributes.

Not 100% sure on a good solution here. A few ideas off the top of my head:

Add a list of dependent attributes to condition (like with watch on attributes) so we know when to re-compute the presence of the index. This would retain the current flexibility and fix index behaviour on mutations, with some added complexity for the entity index definition
Only allow condition to process the attributes defined on the index so if we're re-computing the index we know to also check for its condition. The downside is that this is scoped to the whole index, and not to the individual keys, so we couldn't update solely GSI1PK without also having to update GSI1SK if condition evaluates to true
Allow condition to return more than just a boolean? No clue on what the API for this would look like, but perhaps condition could look at the attributes its passed in the mutation and make the decision to either false (delete the index), true (create the index), or null (don't try to drop or create the index):

condition: (attr) => {
  // effectiveAt not changed, don't create OR drop the index
  if (attr.effectiveAt === undefined) return null;

  // Drop the index if needed
  if (attr.effectiveAt === "n/a") return false;

  // Otherwise re-create the index
  return true;
},

Either way, if we then undergo a state transition because a dependent attribute changed, then the usual .composite() helper can take over to fill out the index as needed (and error if not provided). And we know we did need to do that because the dependent attributes changed.

sam3d commented 2 months ago

Actually wait I've been thinking about this some more, this might be more complex than I thought.

Going from true/false -> false is always okay, because if condition is false then the indexes can always just be dropped and aren't dependent on the previous state of those attributes.

In the case of the condition going from false -> true, then your example makes sense. However, I feel like without some way to model that state transition, then we can't differentiate between false -> true and true -> true. Which is really annoying. We only want to set those index attributes when the condition flips (or if the attributes change), but without some way to model that then all of the index attributes must be set every single time.

Is it maybe possible to do some kind of check where if the condition evaluates to true then it applies an extra condition expression to ensure that the entity has the attributes for that index? And then it'll fail and force the use of the composite helper in order to perform the update?

tywalch commented 2 months ago

I mulled it around a bit today, but I think I got it; let me know what you think:

The condition callback will be invoked only when a composite attribute associated with an index is set via an update, patch, or upsert. [existing behavior]
The condition callback is provided the attributes being set on that particular operation, including the item's identifying composite attributes. [existing behavior]
If the condition callback returns true, ElectroDB will attempt to create the index and all of its associated keys. If an index cannot be created because an update operation only has enough context for a partial key, ElectroDB will throw. [the original issue here, fixed]
If the condition callback returns false, the index and all of its associated keys will be removed from the item. [new behavior]

Item #1 above is the key to solving the issue you bring up in your first comment, and it's actually what we do currently. This means that condition would only be called when an index must be recalculated. furthermore, as described in #3, ElectroDB will actually throw if your update operation (set and remove) lacks a full composite context and would result in a "partial" key. This would mean that all * -> true transitions are already validated to have all the composite parts necessary to recreate the complete index already.

Let me know if this doesn't make sense or I am missing anything. I am writing some tests now, but likely won't be making a PR until tomorrow. I'm definitely not looking to rush a change like this.

sam3d commented 2 months ago

I think this makes sense! With the two caveats worth documenting:

A condition cannot be solely dependent on attributes outside of that composite index. This means that patterns like the following wouldn't work:

{
  condition: (attr) => !attr.deletedAt,
  pk: { field: "GSI1PK", composite: [] },
  sk: { field: "GSI1SK", composite: ["createdAt"] },
}

An index with a condition cannot be updated unless all composite attributes are provided, not just the ones from the field. For example:

{
  pk: { field: "GSI1PK", composite: ["storeId"] },
  sk: { field: "GSI1SK", composite: ["status", "createdAt"] },
}

// Updates GSI1PK, no problem
entity.patch({ id: "29381" }).set({ storeId: "SJR" }).go();

But in order to perform the same operation with a condition:

{
  condition: (attr) => attr.status !== "deleted",
  pk: { field: "GSI1PK", composite: ["storeId"] },
  sk: { field: "GSI1SK", composite: ["status", "createdAt"] },
};

// Condition returns true, so we need to set GSI1PK and GSI1SK in case the index
// doesn't already exist. We now MUST provide status and createdAt
entity.patch({ id: "29381" }).set({ storeId: "SJR" }).go();

// So this must be done instead every time
entity
  .patch({ id: "29381" })
  .set({ storeId: "SJR" })
  .composite({ status: prevEntity.status, createdAt: prevEntity.createdAt })
  .go();

sam3d commented 2 months ago

DynamoDB doesn't propagate to a composite index unless both attributes are present, right? So to remove an entity from a sparse index you technically don't have to drop both, just one of them. Maybe the result of condition only gets applied to the composite key field that's being mutated?

tywalch commented 2 months ago

A condition cannot be solely dependent on attributes outside of that composite index. This means that patterns like the following wouldn't work.

My gut is that the user would determine if they wish to check attributes outside the composite; i.e. that Electro would continue to supply the condition callback with all values being "set". I do agree it is necessary to reiterate in the docs that the attributes being passed represent the values provided in the operation and not the item as it is stored.

An index with a condition cannot be updated unless all composite attributes are provided, not just the ones from the field. For example: // Condition returns true, so we need to set GSI1PK and GSI1SK in case the index // doesn't already exist. We now MUST provide status and createdAt

This is a very, very good point. You're saying that because this could be the first time the condition callback has returned true, but the user only provided enough data on update to create one out of the two keys. In that case, it is also possible that the stored item actually has all the values necessary to form the other key. In this case the user has an expectation that the index will be written, but in reality, they now have an orphaned index record that is not consistent with the underlying attributes.

Great catch! This is sort of the opposite of the issue I brought up above, I had not considered this 🤯

DynamoDB doesn't propagate to a composite index unless both attributes are present, right? So to remove an entity from a sparse index you technically don't have to drop both, just one of them. Maybe the result of condition only gets applied to the composite key field that's being mutated?

How might this work?

An index with a condition cannot be updated unless all composite attributes are provided, not just the ones from the field. For example:

Do you think this problem becomes "solved" by adding this requirement? It sort of seems like it? That said, this will be difficult for newbies to understand, not to mention this would become a request time validation which makes it doubly hard for users to anticipate.

While it does feel like we're close to solving this, I don't want to be too precious about the current API for handling this. I haven't squared the circle in my mind, but this topic does feel like it could have a meaningful overlap with Ross Gerbasi's request for the ability to build and drop specific indexes. His need revolves around backfilling new created indexes on existing items.

I sat down recently and thought about what that API would look like. Creating an API for dropping an index was very straight forward, but I found an API for building a specific index had a lot complexities due to potential attribute dependencies. The simplest approach would be to simply accept composite values "as-is" (i.e. without any setter processing) and add an automatic eq condition on each attribute provided. Additionally, this functionality wouldn't play nice with condition; it would likely have to skip the check to work as anticipated.

Thinking through that feature had me thinking there must be some kind of third way. Does this spark anything in you?

aphex commented 2 months ago

Hey @tywalch so I must admit I haven't spent much time thinking about a broader API and requirements of the library. I have no doubt this is much more complex. We would be good with a very specific Entity reindex method. Expectations would be I provide all attributes to write all indexes or provide some parameters to just target a few or one index.

The use case we are looking to handle is one where attributes already exist in an Entity, so all the data is there, we just need to have better access to it. So we will be scanning the entity and running this process against all items. Maybe electro can even provide help here with a utility to reindex all of an Entity quickly. A combination of concurrent PUTs and queuing for tables with millions of items. The other tool we've used for this is Dynobase but its import is slow compared to a custom script.

Not sure how much this connects with the condition conversation here. However there are likely all sorts of other use cases for index scenarios. Another obvious one would be attributes that do not exist are being added and indexed.

We would also need to keep our createdAt and updatedAt dates. Here's an example of how we configure these in our entities is below. Not sure what the best way forward to keep these values unchanged while we add/remove indexes

createdAt: {
      type: 'string',
      readOnly: true,
      required: true,
      default: getNowISO,
      set: getNowISO,
    },
    updatedAt: {
      type: 'string',
      watch: '*',
      required: true,
      default: getNowISO,
      set: getNowISO,
    },

goosefraba commented 2 months ago

I ran into this issue today as well as I'm new to ElectroDB but using sparse index pretty heavy.

My use-case is similar to what have been said by others already.

state: 'ACTIVE' | 'INACTIVE'

and I just want to have ACTIVE ones in the GSI1 index and get removed once I update the state.

Any idea if there will a fix soon? Or is there any workaround?

tywalch commented 2 months ago

Small update: this is still a WIP, I made significant progress this afternoon, still ironing out tests for edge cases 👍

tywalch commented 2 months ago

GH Closed this issue because I merged a fix, but I'm reopening it to keep discussion open in case there are any issues.

@nonken You can see the impact by using the original playground link you provided 👍

nonken commented 2 months ago

@tywalch this is so good, thank you a million times for diving into this. Can we somehow donate, as in, does the sponsoring work for you?

tywalch commented 2 months ago

@tywalch this is so good, thank you a million times for diving into this. Can we somehow donate, as in, does the sponsoring work for you?

No problem, glad it worked and I'm happy you raised this because it helps everyone!

Yup, I'm setup for sponsoring via GitHub's native sponsoring option (if that works) -- thank you so much for considering me for sponsorship!

tywalch commented 2 months ago

Closing, but feel free to reopen!

severi commented 1 month ago

This update unfortunately started causing issues for us. One example use case would be something like this

store chat messages for a given chat session
chat sessions can be either "unauthenticated" or "attached to logged in users"

previously we had a sparse index for accessing chats by user id (see example entity definition below), and it worked well with v2.13.1. For instance, if a user logs in while having an active chatsession, the unauthenticated chat session is correctly added to the "byUser"-index when the "loggedInUserId" attribute is set (for unauthenticated users it is undefined).

It seems this is no longer supported, or is there some way to still achieve this? The upsert operation we used to create/update the message entries started failing, which was unexpected for a minor version upgrade.

I believe this is a valid use case for sparse indices, and it's unfortunate if the library no longer supports it.

{
  model: {
    entity: 'chatConversation',
    version: '1.0',
    service: 'chatbot',
  },
  attributes: {
    conversationId: {
      type: 'string',
      required: true,
    },
    loggedInUserId: {
      type: 'string',
      required: false,
    },
    messages: {
      type: 'list',
      items: {
        type: 'any',
      },
      required: true,
    },
  },
  indexes: {
    byConversationId: {
      pk: {
        field: 'pk',
        composite: ['conversationId'],
        casing: 'none',
      },
      sk: {
        field: 'sk',
        composite: [],
        casing: 'none',
      },
    },
    byUser: {
      index: 'gsi1pk-gsi1sk-index',
      type: 'clustered',
      condition: (attr: any) => {
        return !!attr.loggedInUserId;
      },
      pk: {
        field: 'gsi1pk',
        composite: ['loggedInUserId'],
        casing: 'none',
      },
      sk: {
        field: 'gsi1sk',
        composite: ['conversationId'],
        casing: 'none',
      },
    },
  },
};

tywalch commented 1 month ago

Hi @severi

Firstly, I want to say that I am sorry your existing code broke do to this fix. I try very hard to make sure this does not happen. Ultimately in choosing between a major and minor update, I chose a minor update because the original implementation had such a significant implementation flaw it was not viable as it was.

As for how you might achieve what you had, I have two potential approaches for you:

Use get and set to apply an empty string when no loggedInUserId is provided (example). This approach ideally won't require your app code to change.
You could simply pass an empty string for loggedInUserId instead of omitting the value (example). This might require some change in your app code to account for an empty string vs undefined

Again, please accept my apology for the unexpected errors in your app, and let me know if any of the above will work for you; If the above won't work in your case, I can work to maybe create an escape hatch (via an execution option or some type of configuration) for this requirement.