valkey-io / valkey

A flexible distributed key-value datastore that is optimized for caching and other realtime workloads.
https://valkey.io
Other
17.36k stars 656 forks source link

[NEW] Support Check and Set functionality #1215

Open hpatro opened 1 month ago

hpatro commented 1 month ago

The problem/use-case that the feature addresses

Allow Valkey user to perform conditional update to a key based on certain condition like matches old value/checksum. This should ideally be supported for string data type via SET command.

API changes in SET command:

SET <key> <value> IFEQ <comparison-value>

Behavior change:

Where the command is executed, IFF the comparison-value equals the current value in the key the rest of the code will be executed. If the values aren't the same, it returns a new error -NOTEQUAL. Alternative: Return a (nil) instead. We'll decide between these two cases in the PR. https://github.com/valkey-io/valkey/issues/1215#issuecomment-2458100191

Memcached CAS:

https://docs.memcached.org/protocols/basic/#cas

Past discussions:

https://groups.google.com/g/redis-db/c/a4zK2k1Lefo https://github.com/redis/redis/pull/8361 https://github.com/redis/redis/issues/12485

Alternatives considered

Use LUA scripting

hpatro commented 1 month ago

@zuiderkwast had already written up how the api might look like https://github.com/redis/redis/issues/12485#issuecomment-1719679074

Want to hear the reasoning @valkey-io/core-team if we would like to build it in Valkey or not. I personally feel we should build the API instead of pointing users to use LUA for basic use cases.

hwware commented 1 month ago

From discussion https://github.com/redis/redis/pull/8361, it looks like many people are waiting for this, I have no reason to against it.

But I have one concern: if in production environment, if client connects the server by connection pool, this command may not work well. Is there a way to apply to connection pooling scenario?

PingXie commented 4 weeks ago

The proposal is really about data versioning and it makes sense on a high level, but stopping at SET/GET and numeric operations feels somewhat arbitrary and could lead to a fragmented or inconsistent experience. I'd like to explore why we're limiting versioning to just these cases.

We should think about what it would mean to support other data types like hashes, sets, and streams. Even if we decide to start with just strings and numeric operations, it’s important to consciously consider how this might extend to other data types in the future to avoid potential gaps in functionality.

It would be helpful to make a deliberate decision about the scope now, even if that means narrowing it for the initial implementation.

Do you mind starting an RFC to capture a holistic proposal?

zuiderkwast commented 4 weeks ago

Check-and-Set? (We're not really swapping two values like the compare-and-swap CPU instruction.)

I see three ways to do this:

  1. Store a version tag or timestamp on each key.
  2. Exact string match, e.g. SET foo baz IF-EQ bar. Simplest, but only useful for small string values.
  3. Digest. It can work with any type (except may not always be implemented for module types?).

I don't think we want to store a version tag on each key. That would increase the memory usage and have other impacts, like RDB, AOF, etc. (On the other hand, such version tag or timestamp could be useful for other features like CRDTs with last-write-wins, active-active.)

We already have CAS-like functionality in the SET XX and NX flags, and HSETNX for hash fields. Those are very simple and don't require anything special from the internals. That's why I only suggested simplistic variants for SET.

Often, the CAS problem can be solved with a separate "cas" key which is used to guard updates on other keys. The "cas" key only needs to store some small user-defined tag. Therefore, I think SET with IF-EQ can already be very powerful.

stopping at SET/GET and numeric operations feels somewhat arbitrary and could lead to a fragmented or inconsistent experience

For arbitrary key types and commands, I think we could easily provide a DIGEST command that works with any key, and some CHECK command that returns an error if a digest doesn't match. This can be used inside a transaction to fail the transaction if the key has changed.

Assume we already got DIGEST foo => adca87a87. To update foo, we can check that it didn't change within a transaction:

MULTI
CHECK foo adca87a87
HSET foo field value
EXEC

If foo has changed, the transaction will fail.

hpatro commented 4 weeks ago

Check-and-Set? (We're not really swapping two values like the compare-and-swap CPU instruction.)

I was thinking why do people say it's "compare and swap" and not "compare and set". "Check and set" seems right. Updated the title. thanks.

hpatro commented 4 weeks ago

The proposal is really about data versioning and it makes sense on a high level, but stopping at SET/GET and numeric operations feels somewhat arbitrary and could lead to a fragmented or inconsistent experience. I'd like to explore why we're limiting versioning to just these cases.

We should think about what it would mean to support other data types like hashes, sets, and streams. Even if we decide to start with just strings and numeric operations, it’s important to consciously consider how this might extend to other data types in the future to avoid potential gaps in functionality.

It would be helpful to make a deliberate decision about the scope now, even if that means narrowing it for the initial implementation.

Do you mind starting an RFC to capture a holistic proposal?

My initial thought was we can compare the string/numeric value directly without maintaining any additional property. Hence, suggested for those two data types. I think if we are aligned we want to build this functionality in some form or other, we can go down the RFC route. If I'm reading the room right, seems like we would like to build it.

hpatro commented 4 weeks ago

I see three ways to do this:

  1. Store a version tag or timestamp on each key.
  2. Exact string match, e.g. SET foo baz IF-EQ bar. Simplest, but only useful for small string values.
  3. Digest. It can work with any type (except may not always be implemented for module types?).

BTW @zuiderkwast ETCD provides 1/2 and from my experience, It's quite powerful. They do maintain multiple version of each key (point 1). And the API is structured as IF-THEN-ELSE which is point 2 with a fallback and avoids a round trip to make a decision on failure.

hpatro commented 4 weeks ago

From discussion redis/redis#8361, it looks like many people are waiting for this, I have no reason to against it.

But I have one concern: if in production environment, if client connects the server by connection pool, this command may not work well. Is there a way to apply to connection pooling scenario?

@hwware If I understand correctly, the client will retry if the condition doesn't match.

PingXie commented 3 weeks ago

For arbitrary key types and commands, I think we could easily provide a DIGEST command that works with any key, and some CHECK command that returns an error if a digest doesn't match. This can be used inside a transaction to fail the transaction if the key has changed.

Yeah I think this approach achieves an effective balance between storage efficiency and flexibility, as it could extend universally to all key types and commands. Standardizing on the MULTI/EXEC idiom as you demonstrated above would further enhance consistency across operations. Given that CAS is not a frequent scenario, computing digests is an acceptable trade-off and it balances design simplicity with minimal performance impact, IMO.

soloestoy commented 3 weeks ago

Compare and set is a very common requirement, and I'm very happy to accept this proposal.

Regarding the implementation, my intuition is:

  1. In the vast majority of scenarios, the value to compare will not be large, and using SET foo baz IF-EQ bar can meet 99% of the needs.

  2. For the remaining 1%, using SET foo baz IF-SHA 37eaa6435955b815 can avoid situations with extremely large values (I don't really think anyone would perform a CAS operation on a very large value).

  3. Versioning is also a good idea, but this would require introducing a new data type. I don't recommend modifying the existing string directly. For example, module tairstring is a type with versioning.

hpatro commented 3 weeks ago

After the discussion on #1087, I think we would like to build CAS functionality along with supporting caching of response of MULTI/EXEC operation in a key. However, we need to be careful with the API design that users don't get confused/don't understand how we support CAS.

hpatro commented 3 weeks ago

Yeah I think this approach achieves an effective balance between storage efficiency and flexibility, as it could extend universally to all key types and commands. Standardizing on the MULTI/EXEC idiom as you demonstrated above would further enhance consistency across operations. Given that CAS is not a frequent scenario, computing digests is an acceptable trade-off and it balances design simplicity with minimal performance impact, IMO.

I disagree with one point, I think CAS is a pretty common scenario, we don't get enough request to build maybe because of LUA scripts.

hpatro commented 3 weeks ago
  1. Versioning is also a good idea, but this would require introducing a new data type. I don't recommend modifying the existing string directly. For example, module tairstring is a type with versioning.

Versioning also enables MVCC (maybe we can build multi threaded command execution :D). But, it brings in lot more complexity and challenges with it like tree like storage of different values for each version, compaction (trimming of older version value(s), new data persistence format, etc. So, we need to be thoughtful if we want to build it.

ranshid commented 3 weeks ago

I think supporting CAS is definitely needed. I agree with @soloestoy that in most cases (not sure 99%) the values are small and SET foo bar IF-EQ bar will probably provide the needs. we can always extend and add a DIGEST option in the future if we see the need.

For arbitrary key types and commands, I think we could easily provide a DIGEST command that works with any key, and some CHECK command that returns an error if a digest doesn't match. This can be used inside a transaction to fail the transaction if the key has changed.

I am not sure if that was the proposal, but I think that having this functionality ONLY as part of transactions would be very limiting. Transactions cannot be nested and run from scripts. I think having this as part of the SET/HSET is probably a very good idea.

I am wondering though how exactly will the API look. There are several cases we MIGHT want to "compare" against: For example given SET a b IF-EQ c

  1. we might want to set the value 'b' to the key 'a' if the value in key 'a' matches the value 'c'
  2. we might want to set the value 'b' to the key 'a' if the key 'a' matches the value STORED in key 'c'
  3. we might want to store the value of key 'b' to key 'a' if the value in key 'a' matches the value 'c'
  4. we might want to store the value of key 'b' to key 'a' if the key 'a' matches the value STORED in key 'c'

Not sure if that does not complicates things by too much. We did such things in the sort function and maybe that is why noone uses it :) But was wondering if we feel that is required?

zuiderkwast commented 3 weeks ago

I think that having this functionality ONLY as part of transactions would be very limiting. Transactions cannot be nested and run from scripts.

@ranshid I just listed some ideas. They are not mutually exclusive.

The DIGEST and CHECK idea is a way to extend this to all key types and not be tied to any specific command. I don't think it's very limiting that transactions can't be used from scripts. Scripts can just do the comparison themselves, using server.call("GET", key) and compare the result before calling SET. No nested transactions... why is that limiting?

I am wondering though how exactly will the API look. There are several cases we MIGHT want to "compare" against: For example given SET a b IF-EQ c

  1. we might want to set the value 'b' to the key 'a' if the value in key 'a' matches the value 'c'

Number 1 is my suggestion. The old value of the same key is all that matters in 99.93% of cases(???). We don't need to change the key specs for the SET command. That would be more complex.

We should compare this conditional execution with other arguments that do conditional execution. They are not too complex and not too different:

madolson commented 3 weeks ago

we might want to set the value 'b' to the key 'a' if the value in key 'a' matches the value 'c'

This is the one that is most likely going to be used for optimistic locking, which is the core use case for CAS (AFAIK). The client will fetch the value, keep it in its memory, and then conditionally apply it back to the server.

From the thread, doing SET <key> <value> IFEQ <previous value> seems pretty uncontroversial. (I changed IF-EQ to IFEQ since there are very few examples of tokens including a dash in the command spec today). I also think that seems "sane" to add.

I see @PingXie wants to see more consistency, but I think that slapping on digest for all datatypes feels like a mistake. If you want to do a conditional write on a stream, which can be megabytes in size, you don't want to constantly do a digest across the whole object. We also generally don't want to be doing hashing on the server side, since it's a computationally expensive operation, and I think a goal of Valkey should be consistent high performance and single slow commands cause latency spikes. I also don't see how that is fundamentally very different from using WATCH. You execute watch on a key, and if it's modified it invalidates the MULTI state. (It's worth mentioning that WATCH has issues in cluster mode that have not really been addressed)

I think that HSET <key> <field> <value> IFEQ <previous value> seems like the better solution instead of a digest, it gives users the maximum flexibility for deciding how to hash or version the data. The HSET command is constrained though, and we can't easily add new fields to the command, so we might need to do something else. It also allows partial matching instead of relying on the entire digest. A native versioning inside the key would be even better, but that seems more invasive than is available today.

I think we should just start with SET + IFEQ. I want us to be mindful that every time we launch a feature we are committing to maintaining it. A third option might be something like a module that implements the functionality, and we can let people opt-in to using that if they want.

hwware commented 3 weeks ago

Before we just begin to work on this, I still have 3 concern:

  1. As @soloestoy mentioned, according to his experience, 99% value is small, thus SET foo baz IF-EQ is good enought. But I just want to confirm if it means calculating the difference between new value and old value will be on the server side.

  2. If we choose format SET foo baz IF-EQ , then client side should keep more space to save the previous value, right?

  3. If the calculation happens in server side, we should limit the size for the value of the key?

ranshid commented 3 weeks ago

I think we should just start with SET + IFEQ. I want us to be mindful that every time we launch a feature we are committing to maintaining it. A third option might be something like a module that implements the functionality, and we can let people opt-in to using that if they way.

I support this as well, but we should also introduce the HSET + IFEQ IMO, many cases AFAIK used the same logic on hashes as the one used on general STRINGS.

BTW - what about MSET? eg MSET k1 v1 k2 v2 ... kn vn IFEQ vv1 vv2 ... vvn will we decide to add support later?

zuiderkwast commented 3 weeks ago

A third option might be something like a module that implements the functionality, and we can let people opt-in to using that if they way.

There are already modules providing this functionality: TairString and TairHash.

BTW - what about MSET? eg MSET k1 v1 k2 v2 ... kn vn IFEQ vv1 vv2 ... vvn

It looks like "IFEQ" is key number n + 1. It is not possible to extend this command in this way. MSET doesn't have NX and XX, so I think it doesn't need IFEQ either.

HSET + IFEQ

HSET has the same problem as HSET: It's not possible to extend it because it is already variadic.

We have HSETNX and we could add HSETIFEQ in the similar style, or a new HSET-extended which can allow args like NX, XX and IFEQ. @enjoy-binbin already suggested this in a Redis PR https://github.com/redis/redis/pull/9058 with the syntax HSETEX key [NX|XX] [CH] FIELDS field value [field value ...]. (Later, they used the command name HSETEX for hash field expiration. If we do this, we should probably use another name too, like HSET2 or HSETIF or ...)

madolson commented 3 weeks ago

There are already modules providing this functionality: TairString and TairHash.

I suppose we could encourage more people to use these then.

MSET k1 v1 k2 v2 ... kn vn IFEQ vv1 vv2 ... vvn

You can always just do a multi-exec of a bunch of normal SETs.

(Later, they used the command name HSETEX for hash field expiration. If we do this, we should probably use another name too, like HSET2 or HSETIF or ...)

HSETIFEQ would be my vote of these options. HSETIFEQ <key> <field> <comparison value> <field> <value> [<field> <value> ...] or similar.

hpatro commented 2 weeks ago

Do we need to support all the data types in the first place? I feel we can start off with regular strings/numbers and go from there.

madolson commented 2 weeks ago

Trying to drive consensus then @valkey-io/core-team please vote 👍 or 👎 on: Extending set to have the following additional syntax.

SET <key> <value> IFEQ <comparison-value>

Where the command is executed, IFF the comparison-value equals the current value in the key the rest of the code will be executed. If the values aren't the same, it returns a new error -NOTEQUAL. Alternative: Return a (nil) instead. We'll decide between these two cases in the PR.

zuiderkwast commented 2 weeks ago

SET with NX or XX returns (nil) when it fails. I think we should follow this pattern for IFEQ too.

QuChen88 commented 1 week ago

Regarding the version numbering idea, I'd like to draw some references to memcached CAS command here - it does require each item to store a 8-byte version number. However, this feature is optional and can be disabled with a static configuration, so users who don't have use-cases such as optimistic locking / consistency can disable this feature and not pay the memory overhead for the item version number. It looks like such

cas <key> <flags> <ttl> <size> <cas_key> where cas_key is the version number that the user stores/remembers when it fetched the value the last time. If the version check succeeds, then it means nobody else updated the value before you and it succeeds, otherwise it would fail.

IMO this is more efficient compared to comparing the value on every update operation (where the value can be small/large, or one of the items in a larger collection, or custom module data types that is harder to specify the value of in a command, etc). Another benefit of this approach is that it's much easier in the future to support additional data types like hash/list/set/sorted set/HLL/Modules etc. i.e.

set <key> <value> [cas <cas_key>] hset <key> <field> <value> [cas <cas_key>] lpush <list> <value> [cas <cas_key>] zadd <key> <score> <member> [cas <cas_key>]

hpatro commented 1 week ago

However, this feature is optional and can be disabled with a static configuration, so users who don't have use-cases such as optimistic locking / consistency can disable this feature and not pay the memory overhead for the item version number.

Does that mean they need to be aware of usage of this feature while building Valkey binary?

madolson commented 1 week ago

Does that mean they need to be aware of usage of this feature while building Valkey binary?

Yeah, I don't like this. I feel like one of the things that made memcached hard to use was all the static configuration. It didn't support dynamic configuration initially.