valkey-io / valkey

A flexible distributed key-value datastore that is optimized for caching and other realtime workloads.
https://valkey.io
Other
17.24k stars 649 forks source link

Augmenting Valkey with multiplexing interface #1300

Open ohadshacham opened 1 day ago

ohadshacham commented 1 day ago

In this proposal we discuss exposing a new protocol that allows serving many Valkey clients over a single TCP connection. Using such a protocol achieves similar performance to a pipeline mode while maintaining, for each client, the same functionality and semantics as if the client was served using a single dedicated TCP connection.

Introduction

A TCP connection for Valkey introduces a client object that maintains a logical state of the connection. This state is used to provide access control guarantees as well as the required semantics for Valkey commands. For example, a client object holds for a connection its ACL user, its watched keys, its blocking state, the pub-sub channels it subscribed to, and more. This state is bound to the TCP connection and is freed upon disconnection.

Since each client uses a dedicated connection, commands for each client are sent to Valkey separately and each response is returned by Valkey to a different network connection. This cause Valkey to spend a large amount of time (62% when using 500 clients) at system calls as well as consume at least one network packet per command and per response.

Pipeline can be applied to reduce the number of packets, amortize the system calls overhead (11% when using 10 clients serving pipelines of 50 commands) and improve the locality (reduction of 44% in L1 cache misses v.s. using 500 clients). However, in many cases, a pipeline cannot be utilized either due to command dependencies or because the client generates only a few commands per second.

For this reason, client implementations like StackExchange.Redis collocate many clients on a single TCP connection while using pipeline to enhance performance. However, since from Valkey perspective only a single logical client is bound to a TCP connection. All collocated clients are handled by Valkey as if all commands arrived from a single client. Naturally, with such configuration, blocking commands, multi/exec, ACLs, and additional commands cannot preserve their required semantics. Therefore, StackExchangeRedis does not support blocking commands and utilize LUA or constraints to abstract multi/exec. Buffers limits also cannot be managed at client level, along with ACLs. Furthermore, since Valkey treats all collocated clients as a single client, no fairness guarantees are provided for the clients’ commands. Consequently, a large command or response from one client may impact the latency of other commands from collocated clients.

Our suggestion - multiplexing protocol

In this proposal, we suggested implementing an additional protocol to Valkey that supports connection multiplexing. Multiplexing is achieved by using a single TCP connection, collocating many clients through the addition of extra metadata. The collocation of commands (and responses) for multiple clients simulates pipeline behavior across a large number of clients, resulting in performance similar to that of a pipeline.

The multiplexing protocol supports all Valkey commands at their original semantics. This means that multi/exec and watches can be applied concurrently to different clients, each client may have different ACLs and even a blocked client does not block the entire connection. Moreover, buffer limits are also enforced per client, and the closing of a client can be performed without disconnecting the connection. When a multiplexed connection is disconnected, all the clients allocated for this connection are closed and the user needs to request new clients to be allocated.

We suggest defining the multiplexing protocol in such a way that each command or response is preceded by a header indicating the client to which the command (or response) is targeted. Additionally, control commands, such as ‘create client’ and ‘client close’, are also encoded in the protocol header.

The following example shows the usage of a single multiplexing connection with two clients, where each client uses a different user with potentially different ACL rules. After the connection is established, an ‘MPXHELLO’ command is sent to define the connection as a multiplexed connection. This command is followed by two ‘create client’ commands that initialize two clients on the Valkey side. After the clients are created, USER1 is set for Client I, and USER2 is set for Client II using Auth commands. Both clients, I and II, then send 'GET' commands for k1 and k2, respectively. At this point, 'Client I' sends a 'BLPOP' command that is blocked since list l1 does not exist. Even though 'Client I' is blocked, and both clients I and II are using the same connection, Client II continues sending 'SET' commands that are processed.

MPX protocol

hpatro commented 22 hours ago

@ohadshacham Could we update the text/diagram with "Valkey" ?

hpatro commented 22 hours ago

Few questions:

PragmaTwice commented 6 hours ago

I think we should also call out other possible benefits. We should be able to avoid connection storm with this protocol I believe. As well as the cost required to establish a TLS connection would get amortized due to the shared nature of physical connection.

Yeah, just like other multiplexing protocols e.g. HTTP/2, I think it's important to have a control mechanism (called “flow control” in HTTP/2) over multiple "streams" in one connection. This way, we can control the priority between streams and prevent some overloaded streams from affecting the entire connection.

Considering that HTTP/2 already has a good ecosystem and a lot of library support, I actually think it’s a good idea to use HTTP/2 directly. But it may bring more complexity.

In addition, the above examples seem to only consider the request-response form, but maybe we also need to consider server-side pushing? It can affect the protocol design.

artikell commented 4 hours ago

multiplexing can be incredibly beneficial in large clusters. If the number of clients exceeds 10,000 and connection pooling is enabled, the sheer number of clients itself becomes a burden.

The biggest challenge lies in how the RESP protocol can support different contexts.

On one hand, it introduces the relationship between connections and clients.

On the other hand, for all requests, we need to add a common header (including both request and response), which increases the associated request costs.

The protocol needs to have some id information to achieve this capability. At the same time, the corresponding client needs to be mapped in the id.