[NEW] MULTI Idempotency Keys for safely retrying non-idempotent commands

byroot commented 1 month ago

For context, I'm the maintainer of https://rubygems.org/gems/redis and https://rubygems.org/gems/redis-client, and I was chatting with @stockholmux a few days ago about what I think is a recurrent pain point for users.

The problem/use-case that the feature addresses

When a client perform a command, if it times out, or it hits a network error, it's pretty much impossible to know if the command was performed on the server side or if it never reached it. Because of this, it is only safe to retry idempotent commands, for any other command, retrying is unsafe which makes it hard to implement clients that are resilient to various network issues.

This isn't specific to valkey nor the RESP protocol, the same sort of issue is common with SQL clients and most datastores.

But this is particularly common in the Ruby community because the historical redis/valkey client, initially developed by Antirez, always defaulted to blindly retry anything in case of network error, and it's now very hard to change because users, especially in cloud environment, then notice a lot more network errors that were previously swept under the rug. I'm trying to course correct that in the lower level redis-client gem by making retry opt-in and requiring users to be more intentional on whether retry should happen for a given command or transaction, but it only goes so far.

So I believe it would be very valuable if there was a way to safely retry non-idempotent commands.

Description of the feature

I think a way this could be achieved, would be for the MULTI command to accept an "idempotency key", somewhat inspired from what Stripe and Paypal APIs are doing.

e.g.

MULTI KEY <random-id> TTL 30
LPUSH mylist 1 2 3
EXEC

The semantic would be that if that transaction succeed and another transaction is performed with the same random-id withing ttl seconds, the repeated transaction isn't actually executed and instead return the same results than the previous transaction, without any side effect.

One potential issue I can see with this is that if used with command that return large output, it has the potential to require saving a whole lot more data on the server, but my assumption is that generally speaking, non-idempotent queries for which this is useful tend to return smaller responses. But this is in part counter acted by the TTL, allowing the server not to need to hold on the result for too long. It might also be possible to specify that only one such result set can be held onto per connection.

Another possibility is to not return the original response on repeated transaction, and just have MULTI return some error instead of QUEUED, but it is much less convenient for the client, as sometimes you are not just interested in the side effect of the transaction. Typically if you need to retry an LPOP, the poped item is lost. But perhaps that too can be an option.

Alternatives you've considered

Another possibility could be to bake this capability in the protocol itself, so it's not limited to transactions, but I don't think it's desirable to change the protocol.

PingXie commented 1 month ago

Thanks @byroot!

This is definitely a pain point, and I think your proposal is a good solution: assigning a unique ID to each request (via the form of a MULTI transaction) and caching the result for a short period. The alternative of modifying the RESP protocol that you mentioned would indeed introduce implementation-level complexity. I agree that it's not ideal for two reasons: first, it brings compatibility risks, and second, I would guess that a large portion of the workload—especially commands like SET/GET—is already idempotent. Therefore, making this mitigation opt-in via this MULTI command extension is a smart approach to avoid unnecessary overhead.

+@valkey-io/core-team to chime in.

zuiderkwast commented 1 month ago

Cool idea.

You would need to wrap all non-idempotent commands in MULTI then?

It would be great to solve this problem. I think we should try to come up with more ways to solve it.

If a protocol level feature could solve this for all commands, then I would even consider it as RESP4 (client opt-in via HELLO).

As a start, we could mark all non-idempotent write commands as such in the COMMAND response.

byroot commented 1 month ago

You would need to wrap all non-idempotent commands in MULTI then?

It would be purely optional, but if you wanted to be able to retry in all circumstances yes. But it would be easy, even for a "dumb client" like redis-client to do so transparently, or to offer a helper to make it very easy. It could even blindly do it for all commands, and let the server not do anything special if the command is idempotent. But I see how it could cause some overhead, and isn't very "clean".

I think we should try to come up with more ways to solve it.

Of course. I wanted to come up with a concrete proposal, so that it sparks a constructive discussion, but I'm not married to a specific solution.

The reason I suggested this one in particular is that it's backward compatible.

If a protocol level feature could solve this for all commands, then I would even consider it as RESP4 (client opt-in via HELLO).

Right, baking this in the protocol would indeed be way more elegant. The reason I'm not very enthusiastic about it is that my client have been among the first ones to require RESP3 and it took years for the ecosystem of cloud providers and proxies to catch up. Even today the fairly popular Envoy proxy still isn't RESP3 compatible:

So why not a RESP4, but I'd suggest to wait for more changes you'd like to make to the protocol before going with it.

zuiderkwast commented 1 month ago

OK, good point with the proxies. I wasn't aware that proxies need to understand the RESP protocol. Are those primarily for clusters? I guess a proxy still can't make a cluster appear like a standalone node, because cross-node commands can't be supported in any sane way.

So, maybe you're right that extending MULTI with this new syntax would be the simplest way.

A problem I can think of, regardless how it's done, is that the server can end up using a lot of memory for these cached replies in the worst case. Clients can be evicted, but these replies are not tied to a client id because when a client reconnects with a new id, it may want to retry the command and get the cached reply. I can't see that we can evict them either. If it turns out to be a problem, then it would be good if a client can ACK the reply in some way so the server can delete the reply ASAP.

A concrete idea for how to encode this in RESP3+ would be to allow the client to set RESP3 attributes on a command. A command is an array of strings, AKA multibulk, but the client could add an attribute to it if the server supports it. Example:

|2<CR><LF>
    +idem-key<CR><LF>
    +98usd98s<CR><LF>
    +idem-ttl<CR><LF>
    :5000<CR><LF>
*5<CR><LF>
    $5<CR><LF>
    LPUSH<CR><LF>
    $6<CR><LF>
    mylist<CR><LF>
    $1<CR><LF>
    1<CR><LF>
    $1<CR><LF>
    2<CR><LF>
    $1<CR><LF>
    3<CR><LF>

Another feature for a future RESP version could be multiplexing, i.e. the possibility to use multiple commands-reply streams, possibly with separate clients IDs, over the same connection. It was mentioned in #54 and one of the linked redis issues.

byroot commented 1 month ago

Are those primarily for clusters?

Not as far as I know, they are used for a whole lot of things.

the server can end up using a lot of memory for these cached replies in the worst case.

Absolutely, hence the TTL, and also why I think the semantic could be that only at most one such reply is saved.

I also think it would be fine to clear the saved response on any subsequent command from the same client (whether it's a MULTI or anything else).

In addition, we could make saving the response purely opt-in, e.g. <random-id> TTL 10 RECORD, if you don't specify the RECORD, on a retry you'd get some specific error that tells you the command was already performed.

Recording the response is useful when otherwise information would be lost (e.g. LPOP), but in many cases such as INCR, it's not that useful, you can just read the key again.

A concrete idea for how to encode this in RESP3+ would be to allow the client to set RESP3 attributes on a command.

That too I'm afraid would likely break some proxies.

I think you could achieve the same thing by introducing a command to hold these info:

RETRY <random-id> TTL 30 RECORD
LPOP mylist

It is maybe a little bit unusual because it's a stateful command. But it's essentially the same as the MULTI extension, but a bit more lightweight and more composable.

hwware commented 1 month ago

2 questions for your idea:

When client send and retry non-idempotent commands, how to set the random-id? If this is really random, how to guarantee the ids are not duplicated in short time?
How we can decide if the command output is not a huge output?

byroot commented 1 month ago

how to set the random-id? If this is really random, how to guarantee the ids are not duplicated in short time?

It's the responsibility of the client to use a sufficiently random source. Typically a UUID would be suggested (but I don't think it need to be enforced).

How we can decide if the command output is not a huge output?

That is indeed something that could cause problems if the feature is abused. My reasoning in that in the large majority of cases, non-idempotent commands don't generate large output or if they do it's output that have been removed from an existing data structure, so the memory usage should be mostly neutral (e.g. LPOP).

But I can't really think of a way that would guarantee this, especially given extension module can do whatever they want, and that in the case of a transactions with multiple commands, it may also include read commands that do return a lot of data.

If really saving the response is deemed too problematic, as mentioned before it's also possible to just return a specific error and let the client know it has to reconcile, it's more work for the client and doesn't cover some use cases, but would already be a huge improvement.

zuiderkwast commented 1 month ago

Regarding LPOP specifically, it appears that LMOVE was designed to solve exactly this problem of unreliable connections. It puts the responsibility on the user to use this command though, rather than the client lib. From the man page:

NAME
       lmove - Returns an element after popping it from one list and pushing it to another.  Deletes the
       list if the last element was moved.

SYNOPSIS
       LMOVE source destination <LEFT | RIGHT> <LEFT | RIGHT>

DESCRIPTION
       Atomically returns and removes the first/last element (head/tail depending on the wherefrom argu‐
       ment)  of  the list stored at source, and pushes the element at the first/last element (head/tail
       depending on the whereto argument) of the list stored at destination.

...

   Pattern: Reliable queue
       Valkey  is  often  used as a messaging server to implement processing of background jobs or other
       kinds of messaging tasks.  A simple form of queue is often obtained pushing values into a list in
       the producer side, and waiting for this values in the consumer side using RPOP  (using  polling),
       or BRPOP if the client is better served by a blocking operation.

       However  in  this context the obtained queue is not reliable as messages can be lost, for example
       in the case there is a network problem or if the consumer crashes just after the message  is  re‐
       ceived but it is still to process.

       LMOVE  (or  BLMOVE  for  the  blocking  variant) offers a way to avoid this problem: the consumer
       fetches the message and at the same time pushes it into a processing list.  It will use the  LREM
       command  in  order  to  remove  the  message  from  the processing list once the message has been
       processed.

       An additional client may monitor the processing list for items that remain  there  for  too  much
       time, and will push those timed out items into the queue again if needed.

byroot commented 1 month ago

Yes absolutely. I'm using LPOP purely as an example here, not as the sole reason I want this. And this wouldn't even replace LMOVE because LMOVE gives even more guarantees (e.g. maybe the client literally crash an will never retry, so other clients should discover the abandoned item that has been moved).

If you want a real world example that happened to us not long ago, and prompted me to think about this feature again, it was with LPUSH: https://github.com/Shopify/ci-queue/pull/287

zuiderkwast commented 1 month ago

Yeah, I know the problem is real. I'm just trying to think of related topics to get the bigger picture. It's interesting that this problem was mentioned in LMOVE's docs. It means this is not the first time this problem is discussed.

madolson commented 1 month ago

Idempotency is a long standing issue with Redis and Valkey, and is a hard problem generally within computer science. It's why streams have the consumer groups so that you can do acknowledged reads/writes. I think it's definitely worth trying to make it better, but some of my initial thoughts:

How to handle the idempotency during slot migrations in cluster mode. If you commit something, and then the data is migrated to another node, the other node won't have the idempotency information.
Replicating the idempotency tokens through replication seems fairly straightforward if we are just erroring on a conflict. If we are also caching the error response, we would also need to replicate the response to replicas, which is less ideal.
We might also bake in the ability for clients to acknowledge that they got the response. It increases the network chatter, but it reduces the amount of memory that the server has to keep track of.
We have https://valkey.io/topics/command-tips/, we should probably document the idempotent vs non-idempotent commands.
As a distributed systems engineer, I feel a lot more comfortable with just erroring on idempotency failures in the database layer. In the ElastiCache service, we handle idempotency failures all over the place by checking the state after a failure, not having the database keep track of it. Redis was always known for being easy to use, but often ignored correctness to achieve that.
This feels intuitively very similar to client side tracking in terms of structure. The client indicates that it wants some state to happen on the next command. We could implement CLIENT IDEMPOTENT <token> <ttl> and CLIENT CLEAR-IDEMPOTENT <token>. Everything gets attached to the client then. Then it's decoupled from multi.

@barshaul @avifenesh Any thoughts about this from the client perspective?

byroot commented 1 month ago

I feel a lot more comfortable with just erroring on idempotency failures in the database layer.

To be clear, I totally see how saving the response can be challenging, so if it's deemed too complicated / risky I'll understand. Just having a specific error that allow to know the command was already perform would already be a huge leap forward.

The big benefit of recording the response would be to allow "dumb clients" like mine to retry transparently.

zuiderkwast commented 1 month ago

Redis was always known for being easy to use, but often ignored correctness to achieve that.

This was probably a key to success.

To keep it simple, we could continue this tradition. We could recommend folks to work around it:

use streams (with ACK) instead of lists.
wrap INCR (etc.) in MULTI-EXEC where you also write to some client-specific key that you can check later to find out if the transaction succeeded or not.

madolson commented 1 month ago

wrap INCR (etc.) in MULTI-EXEC where you also write to some client-specific key that you can check later to find out if the transaction succeeded or not.

I've done this a lot. I did a talk at Kubecon where we applied changes from a change stream and used a lua script + idempotency key to check if something was applied. We were using the non-idempotent commands like ZINCRBY. It works "ok" for CME workloads.

avifenesh commented 1 month ago

Talking from clients side (Glide) and client side only: It does sound a nice feature that would be helpful and as everything else, if it will be introduced in ValKey we will support it, BUT - It feels a bit like a half-baked feature that will create a lot of work to clients, potentially more than help. If it will be delivered, the client will need to create the command block, create wrappers for users to have a pleasant UX, implement the retries, handle configurations, generate ID and track it and many more tasks. Plus, there's what I call configuration hell, where a client has so much optional configuration so to have the right client setup you need to have a degree in it. And I see it every day helping users (Elasticache users), manage to create the right cocktail of configuration so they stop having 10 minutes downtime. This feature introduces another extra optional configuration as how many retries (as it different from usual command), what is the TTL and probably some more. If the feature ends up as offering a block of multi-exec with TTL clients actually could implement a non-efficient version of it long time ago, for user who willing to scarify a bit of memory and performance for some commands they want to be safe about.

If it ends up implemented, I would actually hope that it will be a kind of transparent API, where you call it SAFER_LPUSH or similar, and accept TTL and ID as extra parameter and clients will just take care of retries.

PingXie commented 1 month ago

Redis was always known for being easy to use, but often ignored correctness to achieve that.

While ignoring correctness may have been acceptable in the past, I don't think we can continue operating that way if we want Valkey to be the default choice for application developers.

Idempotency handling is a difficult problem, and it has come up in my experience multiple times too. If we find ourselves discussing workarounds at conferences for handling it client-side, I think it’s time to seriously consider a server-side solution. Offloading idempotency handling to clients shifts complexity from the server to the clients, and there are far more clients out there. That said, while this is a real problem worth addressing, I don't believe it’s an "80% problem" either. Therefore, making significant changes to the RESP protocol feels excessive. Extending MULTI would be a much more natural and backward-compatible solution.

How to handle the idempotency during slot migrations in cluster mode.

That’s a great point. I think we’d need to track the new states at a per-slot level. It might make sense to sequence this after the RDB-based Atomic Slot Migration work so that we can easily pass along additional metadata.

As for the other comments on the issue, I believe they’re mostly tactical, concerns about memory usage efficiency, caching error responses, or the exact command structure. These are important, but I don't think they would invalidate the overall problem statement.

It feels a bit like a half-baked feature that will create a lot of work to clients, potentially more than help. If it will be delivered, the client will need to create the command block, create wrappers for users to have a pleasant UX, implement the retries, handle configurations, generate ID and track it and many more tasks.

@avifenesh, I agree that this would increase client-side complexity, but do you think it’s still preferable to having application developers implement their own idempotency error handling? There are orders of magnitude more applications than clients, and far more clients than the server, which is just one.

avifenesh commented 1 month ago

@PingXie You right, i opened by saying that it's a good idea.

My point is that it is not whole. As a client dev id like to have fully baked and ready to eat features, which do not require me to create workarounds and special configurations.

If it was a fix to some bug, it is what it is, you deliver fast. But if it's adding a new feature, why not take a bit more time and make the UX awesome.

madolson commented 1 month ago

As for the other comments on the issue, I believe they’re mostly tactical, concerns about memory usage efficiency, caching error responses, or the exact command structure. These are important, but I don't think they would invalidate the overall problem statement.

I think 5 is the most important topic to discuss, which is are we going to cache the response or are we just going to error. The remaining points are mostly derivative of this issue. We should probably discuss this in a core meeting.

@byroot Let me know if you want to join, it's every monday at 14:00 UTC. It seems like you're in France, which makes it a pretty nice time for you :)

byroot commented 1 month ago

it's every monday at 14:00 UTC. It seems like you're in France, which makes it a pretty nice time for you :)

That should work for me.

soloestoy commented 3 weeks ago

Good idea, we've previously discussed this issue internally and tried to solve it using named transactions. This idea aligns with ours :)

madolson commented 3 weeks ago

Notes from core meeting:

All of this functionality could be implemented in a LUA script. Making clients execute a LUA script puts a lot more complexity into the client.
Alternative: Implement an EXECSTORE <keyname> or some similar way to store the result. The logic is still implemented on the server side. Client has to keep track of the key name. We can also modify the MULTI command to also take in a keyname, and if the key exists it will return the response of the key instead of executing the exec. Maybe it should be a MULTISTORE <keyname> command, so that the first command can response with the cached response. The problem with this is that we have to specially handle the skipped "OK" and "QUEUED". (This should be OK).
Conceptually this is similar to check and set functionality proposed in https://github.com/valkey-io/valkey/issues/1215.
We should always require a TTL, to make sure end users don't accidentally shoot themselves in the foot.
Let's look at it the limits space holistically.

"I would rather have a chainsaw than a butterknife" - A wise man.

hpatro commented 3 weeks ago

"I would rather have a chainsaw than a butterknife" - A wise man.

A typical Valkey user 😄

Alternative: Implement an EXECSTORE or some similar way to store the result. The logic is still implemented on the server side. Client has to keep track of the key name. We can also modify the MULTI command to also take in a keyname, and if the key exists it will return the response of the key instead of executing the exec. Maybe it should be a MULTISTORE command, so that the first command can response with the cached response. The problem with this is that we have to specially handle the skipped "OK" and "QUEUED". (This should be OK).

I was thinking a bit more on this topic, we need to think about the API semantics more so that we're not restricting a user to perform just a null check, rather we should introduce operator like (EQ, IS_NULL, IS_NOT_NULL, etc) to allow different use cases and then perform transaction if the conditional statement matches.

MULTI [OP key IS_NULL|IS_NOT_NULL|EQ [value|hash]]
<set of commands>
EXEC

For idempotency, it would just deal with IS_NULL unary operator.

zuiderkwast commented 3 weeks ago

I would like to name this feature "Idempotent transactions". It sounds good for the release news blog posts. :)

Regarding MULTISTORE, I was thinking more like

MULTISTORE key EX 10  ==>  +OK | +STORED
SET foo bar           ==>  +QUEUED
LPOP baz              ==>  +QUEUED
EXEC                  ==>  1) +OK
                           2) "quux"

EXEC returns the result of the transaction, regardless of the reply of MULTISTORE. The reply of MULTISTORE can be ignored, or if you want to avoid sending large values over the wire, you can if it returned +STORED and if it's stored, you don't need to queue any commands and can go directly to EXEC. EXEC would return the same result anyway, since the queued commands would just be ignored when the result is already stored.

Thinking about this now though, EXEC is where the logic would need to be handled. Therefore, I think EXECSTORE is more clear and intuitive. Or we can call it IDEMPOTENT-EXEC or something (because it doesn't only store the result, but returns it if already stored). If the users really want to check if it's already stored, they can just use EXISTS key.

As mentioned in the meeting, we need to store the transaction result as some RESP-encoded blob that we can copy from the reply buffer. We may store it as an opaque string type key, just so we don't need to introduce a new key type (to avoid an RDB bump and other nasty things implied by a new type). If users do GET on it, they get some undefined blob representation of it. The normal way to retrieve the value should be by retrying the transaction.

@hpatro

MULTI [OP key IS_NULL|IS_NOT_NULL|EQ [value|hash]]
<set of commands>
EXEC
For idempotency, it would just deal with IS_NULL unary operator.

How does this tell us to store the result of EXEC in key?

If clients want to conditionally populate keys, I believe check-and-set is a better feature for this. If we provide check-and-set, users don't need to abuse idempotent transactions for this.

valkey-io / valkey

[NEW] MULTI Idempotency Keys for safely retrying non-idempotent commands #1087