Open hpatro opened 1 month ago
@zuiderkwast had already written up how the api might look like https://github.com/redis/redis/issues/12485#issuecomment-1719679074
Want to hear the reasoning @valkey-io/core-team if we would like to build it in Valkey or not. I personally feel we should build the API instead of pointing users to use LUA for basic use cases.
From discussion https://github.com/redis/redis/pull/8361, it looks like many people are waiting for this, I have no reason to against it.
But I have one concern: if in production environment, if client connects the server by connection pool, this command may not work well. Is there a way to apply to connection pooling scenario?
The proposal is really about data versioning and it makes sense on a high level, but stopping at SET/GET and numeric operations feels somewhat arbitrary and could lead to a fragmented or inconsistent experience. I'd like to explore why we're limiting versioning to just these cases.
We should think about what it would mean to support other data types like hashes, sets, and streams. Even if we decide to start with just strings and numeric operations, it’s important to consciously consider how this might extend to other data types in the future to avoid potential gaps in functionality.
It would be helpful to make a deliberate decision about the scope now, even if that means narrowing it for the initial implementation.
Do you mind starting an RFC to capture a holistic proposal?
Check-and-Set? (We're not really swapping two values like the compare-and-swap CPU instruction.)
I see three ways to do this:
SET foo baz IF-EQ bar
. Simplest, but only useful for small string values.I don't think we want to store a version tag on each key. That would increase the memory usage and have other impacts, like RDB, AOF, etc. (On the other hand, such version tag or timestamp could be useful for other features like CRDTs with last-write-wins, active-active.)
We already have CAS-like functionality in the SET XX and NX flags, and HSETNX for hash fields. Those are very simple and don't require anything special from the internals. That's why I only suggested simplistic variants for SET.
Often, the CAS problem can be solved with a separate "cas" key which is used to guard updates on other keys. The "cas" key only needs to store some small user-defined tag. Therefore, I think SET with IF-EQ can already be very powerful.
stopping at SET/GET and numeric operations feels somewhat arbitrary and could lead to a fragmented or inconsistent experience
For arbitrary key types and commands, I think we could easily provide a DIGEST
command that works with any key, and some CHECK
command that returns an error if a digest doesn't match. This can be used inside a transaction to fail the transaction if the key has changed.
Assume we already got DIGEST foo
=> adca87a87
. To update foo, we can check that it didn't change within a transaction:
MULTI
CHECK foo adca87a87
HSET foo field value
EXEC
If foo has changed, the transaction will fail.
Check-and-Set? (We're not really swapping two values like the compare-and-swap CPU instruction.)
I was thinking why do people say it's "compare and swap" and not "compare and set". "Check and set" seems right. Updated the title. thanks.
The proposal is really about data versioning and it makes sense on a high level, but stopping at SET/GET and numeric operations feels somewhat arbitrary and could lead to a fragmented or inconsistent experience. I'd like to explore why we're limiting versioning to just these cases.
We should think about what it would mean to support other data types like hashes, sets, and streams. Even if we decide to start with just strings and numeric operations, it’s important to consciously consider how this might extend to other data types in the future to avoid potential gaps in functionality.
It would be helpful to make a deliberate decision about the scope now, even if that means narrowing it for the initial implementation.
Do you mind starting an RFC to capture a holistic proposal?
My initial thought was we can compare the string/numeric value directly without maintaining any additional property. Hence, suggested for those two data types. I think if we are aligned we want to build this functionality in some form or other, we can go down the RFC route. If I'm reading the room right, seems like we would like to build it.
I see three ways to do this:
- Store a version tag or timestamp on each key.
- Exact string match, e.g.
SET foo baz IF-EQ bar
. Simplest, but only useful for small string values.- Digest. It can work with any type (except may not always be implemented for module types?).
BTW @zuiderkwast ETCD provides 1/2 and from my experience, It's quite powerful. They do maintain multiple version of each key (point 1). And the API is structured as IF-THEN-ELSE
which is point 2 with a fallback and avoids a round trip to make a decision on failure.
From discussion redis/redis#8361, it looks like many people are waiting for this, I have no reason to against it.
But I have one concern: if in production environment, if client connects the server by connection pool, this command may not work well. Is there a way to apply to connection pooling scenario?
@hwware If I understand correctly, the client will retry if the condition doesn't match.
For arbitrary key types and commands, I think we could easily provide a
DIGEST
command that works with any key, and someCHECK
command that returns an error if a digest doesn't match. This can be used inside a transaction to fail the transaction if the key has changed.
Yeah I think this approach achieves an effective balance between storage efficiency and flexibility, as it could extend universally to all key types and commands. Standardizing on the MULTI/EXEC
idiom as you demonstrated above would further enhance consistency across operations. Given that CAS is not a frequent scenario, computing digests is an acceptable trade-off and it balances design simplicity with minimal performance impact, IMO.
Compare and set is a very common requirement, and I'm very happy to accept this proposal.
Regarding the implementation, my intuition is:
In the vast majority of scenarios, the value to compare will not be large, and using SET foo baz IF-EQ bar
can meet 99% of the needs.
For the remaining 1%, using SET foo baz IF-SHA 37eaa6435955b815
can avoid situations with extremely large values (I don't really think anyone would perform a CAS operation on a very large value).
Versioning is also a good idea, but this would require introducing a new data type. I don't recommend modifying the existing string directly. For example, module tairstring is a type with versioning.
After the discussion on #1087, I think we would like to build CAS functionality along with supporting caching of response of MULTI/EXEC
operation in a key. However, we need to be careful with the API design that users don't get confused/don't understand how we support CAS.
Yeah I think this approach achieves an effective balance between storage efficiency and flexibility, as it could extend universally to all key types and commands. Standardizing on the
MULTI/EXEC
idiom as you demonstrated above would further enhance consistency across operations. Given that CAS is not a frequent scenario, computing digests is an acceptable trade-off and it balances design simplicity with minimal performance impact, IMO.
I disagree with one point, I think CAS is a pretty common scenario, we don't get enough request to build maybe because of LUA
scripts.
- Versioning is also a good idea, but this would require introducing a new data type. I don't recommend modifying the existing string directly. For example, module tairstring is a type with versioning.
Versioning also enables MVCC (maybe we can build multi threaded command execution :D). But, it brings in lot more complexity and challenges with it like tree like storage of different values for each version, compaction (trimming of older version value(s), new data persistence format, etc. So, we need to be thoughtful if we want to build it.
I think supporting CAS is definitely needed. I agree with @soloestoy that in most cases (not sure 99%) the values are small and SET foo bar IF-EQ bar
will probably provide the needs. we can always extend and add a DIGEST option in the future if we see the need.
For arbitrary key types and commands, I think we could easily provide a DIGEST command that works with any key, and some CHECK command that returns an error if a digest doesn't match. This can be used inside a transaction to fail the transaction if the key has changed.
I am not sure if that was the proposal, but I think that having this functionality ONLY as part of transactions would be very limiting. Transactions cannot be nested and run from scripts. I think having this as part of the SET/HSET is probably a very good idea.
I am wondering though how exactly will the API look. There are several cases we MIGHT want to "compare" against:
For example given
SET a b IF-EQ c
Not sure if that does not complicates things by too much. We did such things in the sort function and maybe that is why noone uses it :) But was wondering if we feel that is required?
I think that having this functionality ONLY as part of transactions would be very limiting. Transactions cannot be nested and run from scripts.
@ranshid I just listed some ideas. They are not mutually exclusive.
The DIGEST and CHECK idea is a way to extend this to all key types and not be tied to any specific command. I don't think it's very limiting that transactions can't be used from scripts. Scripts can just do the comparison themselves, using server.call("GET", key)
and compare the result before calling SET. No nested transactions... why is that limiting?
I am wondering though how exactly will the API look. There are several cases we MIGHT want to "compare" against: For example given
SET a b IF-EQ c
- we might want to set the value 'b' to the key 'a' if the value in key 'a' matches the value 'c'
Number 1 is my suggestion. The old value of the same key is all that matters in 99.93% of cases(???). We don't need to change the key specs for the SET command. That would be more complex.
We should compare this conditional execution with other arguments that do conditional execution. They are not too complex and not too different:
NX
and XX
arguments of SET
SETNX
and HSETNX
EXPIRE key seconds [NX | XX | GT | LT]
we might want to set the value 'b' to the key 'a' if the value in key 'a' matches the value 'c'
This is the one that is most likely going to be used for optimistic locking, which is the core use case for CAS (AFAIK). The client will fetch the value, keep it in its memory, and then conditionally apply it back to the server.
From the thread, doing SET <key> <value> IFEQ <previous value>
seems pretty uncontroversial. (I changed IF-EQ to IFEQ since there are very few examples of tokens including a dash in the command spec today). I also think that seems "sane" to add.
I see @PingXie wants to see more consistency, but I think that slapping on digest for all datatypes feels like a mistake. If you want to do a conditional write on a stream, which can be megabytes in size, you don't want to constantly do a digest across the whole object. We also generally don't want to be doing hashing on the server side, since it's a computationally expensive operation, and I think a goal of Valkey should be consistent high performance and single slow commands cause latency spikes. I also don't see how that is fundamentally very different from using WATCH. You execute watch on a key, and if it's modified it invalidates the MULTI state. (It's worth mentioning that WATCH
has issues in cluster mode that have not really been addressed)
I think that HSET <key> <field> <value> IFEQ <previous value>
seems like the better solution instead of a digest, it gives users the maximum flexibility for deciding how to hash or version the data. The HSET command is constrained though, and we can't easily add new fields to the command, so we might need to do something else. It also allows partial matching instead of relying on the entire digest. A native versioning inside the key would be even better, but that seems more invasive than is available today.
I think we should just start with SET + IFEQ. I want us to be mindful that every time we launch a feature we are committing to maintaining it. A third option might be something like a module that implements the functionality, and we can let people opt-in to using that if they want.
Before we just begin to work on this, I still have 3 concern:
As @soloestoy mentioned, according to his experience, 99% value is small, thus SET foo baz IF-EQ
If we choose format SET foo baz IF-EQ
If the calculation happens in server side, we should limit the size for the value of the key?
I think we should just start with SET + IFEQ. I want us to be mindful that every time we launch a feature we are committing to maintaining it. A third option might be something like a module that implements the functionality, and we can let people opt-in to using that if they way.
I support this as well, but we should also introduce the HSET + IFEQ IMO, many cases AFAIK used the same logic on hashes as the one used on general STRINGS.
BTW - what about MSET?
eg MSET k1 v1 k2 v2 ... kn vn IFEQ vv1 vv2 ... vvn
will we decide to add support later?
A third option might be something like a module that implements the functionality, and we can let people opt-in to using that if they way.
There are already modules providing this functionality: TairString and TairHash.
BTW - what about MSET? eg
MSET k1 v1 k2 v2 ... kn vn IFEQ vv1 vv2 ... vvn
It looks like "IFEQ" is key number n + 1. It is not possible to extend this command in this way. MSET doesn't have NX and XX, so I think it doesn't need IFEQ either.
HSET + IFEQ
HSET has the same problem as HSET: It's not possible to extend it because it is already variadic.
We have HSETNX and we could add HSETIFEQ in the similar style, or a new HSET-extended which can allow args like NX, XX and IFEQ. @enjoy-binbin already suggested this in a Redis PR https://github.com/redis/redis/pull/9058 with the syntax HSETEX key [NX|XX] [CH] FIELDS field value [field value ...]
. (Later, they used the command name HSETEX for hash field expiration. If we do this, we should probably use another name too, like HSET2 or HSETIF or ...)
There are already modules providing this functionality: TairString and TairHash.
I suppose we could encourage more people to use these then.
MSET k1 v1 k2 v2 ... kn vn IFEQ vv1 vv2 ... vvn
You can always just do a multi-exec of a bunch of normal SETs.
(Later, they used the command name HSETEX for hash field expiration. If we do this, we should probably use another name too, like HSET2 or HSETIF or ...)
HSETIFEQ
would be my vote of these options. HSETIFEQ <key> <field> <comparison value> <field> <value> [<field> <value> ...]
or similar.
Do we need to support all the data types in the first place? I feel we can start off with regular strings/numbers and go from there.
Trying to drive consensus then @valkey-io/core-team please vote 👍 or 👎 on: Extending set to have the following additional syntax.
SET <key> <value> IFEQ <comparison-value>
Where the command is executed, IFF the comparison-value equals the current value in the key the rest of the code will be executed. If the values aren't the same, it returns a new error -NOTEQUAL
. Alternative: Return a (nil)
instead. We'll decide between these two cases in the PR.
SET with NX or XX returns (nil)
when it fails. I think we should follow this pattern for IFEQ too.
Regarding the version numbering idea, I'd like to draw some references to memcached CAS
command here - it does require each item to store a 8-byte version number. However, this feature is optional and can be disabled with a static configuration, so users who don't have use-cases such as optimistic locking / consistency can disable this feature and not pay the memory overhead for the item version number. It looks like such
cas <key> <flags> <ttl> <size> <cas_key>
where cas_key
is the version number that the user stores/remembers when it fetched the value the last time. If the version check succeeds, then it means nobody else updated the value before you and it succeeds, otherwise it would fail.
IMO this is more efficient compared to comparing the value on every update operation (where the value can be small/large, or one of the items in a larger collection, or custom module data types that is harder to specify the value of in a command, etc). Another benefit of this approach is that it's much easier in the future to support additional data types like hash/list/set/sorted set/HLL/Modules etc. i.e.
set <key> <value> [cas <cas_key>]
hset <key> <field> <value> [cas <cas_key>]
lpush <list> <value> [cas <cas_key>]
zadd <key> <score> <member> [cas <cas_key>]
However, this feature is optional and can be disabled with a static configuration, so users who don't have use-cases such as optimistic locking / consistency can disable this feature and not pay the memory overhead for the item version number.
Does that mean they need to be aware of usage of this feature while building Valkey binary?
Does that mean they need to be aware of usage of this feature while building Valkey binary?
Yeah, I don't like this. I feel like one of the things that made memcached hard to use was all the static configuration. It didn't support dynamic configuration initially.
The problem/use-case that the feature addresses
Allow Valkey user to perform conditional update to a key based on certain condition like matches old value/checksum. This should ideally be supported for string data type via SET command.
API changes in SET command:
Behavior change:
Where the command is executed, IFF the comparison-value equals the current value in the key the rest of the code will be executed. If the values aren't the same, it returns a new error -NOTEQUAL. Alternative: Return a (nil) instead. We'll decide between these two cases in the PR. https://github.com/valkey-io/valkey/issues/1215#issuecomment-2458100191
Memcached CAS:
https://docs.memcached.org/protocols/basic/#cas
Past discussions:
https://groups.google.com/g/redis-db/c/a4zK2k1Lefo https://github.com/redis/redis/pull/8361 https://github.com/redis/redis/issues/12485
Alternatives considered
Use LUA scripting