valkey-io / valkey

A flexible distributed key-value datastore that supports both caching and beyond caching workloads.
https://valkey.io
Other
15.82k stars 589 forks source link

Support for key locking #3

Open madolson opened 6 months ago

madolson commented 6 months ago

There are several features on our roadmap that assume that we will develop some form of locking on an individual key. I think there is only one commonly accepted approach to achieving this, which is to mark a key as "in use" and block commands that attempt to mutate that key. Commands may either take a "shared lock", multiple different places can take a "shared lock" on the key, or an "exclusive lock", only one such lock can be granted at a given time.

This issue is just a high level summary of all the known issues related to key locking. Most of this is derived from https://github.com/redis/redis/issues/7372#issuecomment-640956204. Re-posting verbatim here:

Even if the command is not explicitly writing, there are several cases where data is modified when reading. These include: PFCOUNT and also any data structure where the format can be internally changed - like hash table, skiplist, compressed list encoding which can be in the process of rehashing or other data transformation. These modifications must be prevented in both main and background threads. There are significant complexities with MULTI/EXEC. If a MULTI/EXEC hit’s a blocked key, it must either: synchronously pause the Redis main thread (waiting for the background thread to release the lock) - this also comes with complexity regarding completing/joining the background command while the main thread is paused. pre-examine the MULTI/EXEC to determine if any locked keys will be used. This also comes with the challenge of tracking keys across databases if SWAPDB and/or SELECT is used in the MULTI/EXEC block. block all MULTI/EXEC commands if any async command is running. There are significant challenges with LUA. LUA is not forced to declare keys used. Furthermore, even if a LUA script does declare which keys are touched, the declaration does not indicate which DB the key is associated with. There is no mechanism to determine keys by inspection (as is possible with MULTI/EXEC). The safest option for LUA might be to block any LUA script if any async command is running. Throwing an error is not an option as it breaks atomicity guarantees. There is a conflict with FLUSHDB/FLUSHALL. These keyless commands would need to be blocked until no background thread was running. There is the possibility of interaction with EXPIRE and EVICT processing. Deep dive is necessary here to understand the issues with each. Client disconnect (on the main thread) needs to interact with async processing in a background thread. BRPOPLPUSH is a special can of worms and needs to be carefully considered. Consider BRPOPLPUSH FROM TO. Later, while a background thread has “TO” locked, another client performs an LPUSH FROM. The LPUSH (to a different key) would need to be blocked OR the BRPOPLPUSH would need to be further blocked. Should MONITOR show the executed command at the beginning of processing? or the end? I know you’re currently looking at background reads, but if the slow command needs to perform a write, the logical lock must also block readers on the main thread, not just writers. If the background command is a write command, unlocking of the key needs to be coordinated with replication. In addition, an important part of this proposal involves the blocking of clients until a key is unlocked. The existing blocking mechanism (for blocking commands like BRPOPLPUSH) is a good candidate for re-work at this juncture.

The existing blocking mechanism works as follows:

The blocking command is processed normally through processCommand(). In the command’s handler, the command is removed from the client’s normal cmd/argv/argc fields and copied over to custom fields for the command. (The normal fields are cleared.) When the command is unblocked, there is a completely separate code path (independent from processCommand()) which is used to finish execution of the blocking command. Not only does a separate code path require independent maintenance, but can lead to subtle differences in the way commands function if blocked. (Hint: IIRC, MONITORing a BRPOPLPUSH will look subtly different if it blocks or not.) Instead, changing the blocking mechanism will better support this new use-case (and potentially others).

When a client is blocked (for any reason), leave the cmd/argv/argc attached on the client as normal. When the client is unblocked, pass it back through processCommand() to execute as normal. This might result in the client being re-blocked for another reason. Some care must be taken to ensure that MONITOR works as expected and only logs the 1st attempt. This approach eliminates a bunch of blocking command fields on the client struct. It eliminates a separate code path for finishing these commands. It provides a standard approach to blocking clients. Note that some adjustment of timeout might be necessary when re-blocking. Example: A BRPOPLPUSH is blocked waiting on something to pop. A LPUSH is performed which unblocks the BRPOPLPUSH. We attempt to execute BRPOPLPUSH normally by sending it through processCommand(). However it is discovered that the destination key is locked. The BRPOPLPUSH client is re-blocked. Later, when the target key is unlocked, we again send BRPOPLPUSH through processCommand() and this time it succeeds.

Example: “MSET k1 v1 k2 v2” is performed. It is recognized that k1 is locked. The client is blocked. When k1 is unlocked, the MSET command is passed back through processCommand(). It is determined that now, k2 is locked. The client is blocked a second time. When k2 is unlocked, the command is sent through processCommand() a third time, and may execute normally.

Old ref: https://github.com/redis/redis/issues/11396

zuiderkwast commented 5 months ago

Will key locking allow threaded command execution? If yes, then I think we'll need this for sure, though I think there's a lot more to consider. Perhaps we should have a global lock that needs to be taken for changing anything else, like config, acl, etc.

madolson commented 5 months ago

We may not need it for multi-threaded execution. You can have less granular locks, such as on a hash table bucket, that also enables multi-threaded execution.

daniel-house commented 5 months ago

Perhaps locking on slots? Less granular that hash-table-buckets, but supporting more than enough threads in the dictionary at the same time. It looks like a good fit for the way cluster-mode denies multi-slot commands. This might also minimize the problems with LUA and MULTI/EXEC.

zuiderkwast commented 5 months ago

For cluster, we could do one mutex per cluster slot, but in a standalone instance, the keys are not stored per slot. We could consider storing them per slot anyway though, but computing the slot using CRC16 takes some extra CPU cycles. We could take some bits off the hash function used for the hash table instead and just have an array of some mutexes. If we do a lock per key or per hashtable bucket, a mutex takes too much memory but we could use a spinlock.

I want to look into a more modern hash table implementation. One that can use vector instructions (like AVX2) and be more cache friendly, open addressing. If we do that, perhaps there is some logical place to put a lock.

zuiderkwast commented 5 months ago

We may want to take a look at Dragonfly too. From its README:

To provide atomicity guarantees for multi-key operations, we use the advancements from recent academic research. We chose the paper "VLL: a lock manager redesign for main memory database systems” to develop the transactional framework for Dragonfly. The choice of shared-nothing architecture and VLL allowed us to compose atomic multi-key operations without using mutexes or spinlocks. This was a major milestone for our PoC and its performance stood out from other commercial and open-source solutions.