Rust client - Githubissues

Describe the feature

Given Glide's core is Rust and supports some useful additions to redis-rs in terms of connectivity & resilience to MemoryDB/Elasticache services it would be valuable to be able to use this with Rust direclty.

Use Case

Redis client for Rust.

Proposed Solution

No response

Other Information

No response

Acknowledgements

[ ] I may be able to implement this feature request
[ ] This feature might incur a breaking change

Client version used

n/a

Environment details (OS name and version, etc.)

n/a

In the short term, we plan on focusing our efforts on higher-level languages, so this probably won't happen in the next few months. In the meanwhile we make every effort to upstream our changes to redis-rs, so if you use it, you will eventually receive most, if not all, the capabilities of GLIDE for Redis. In the meanwhile, if there's any specific feature you're missing or bug that's bothering you, feel free to open an issue on redis-rs, and I'll try to fix it there ASAP.

FWIW, we just picked Fred as our Redis client for Rust: https://github.com/aembke/fred.rs

The reasons being (based on a brief evaluation): 1) natively async, so we don't need to look at this on every setting, 2) native pool support, 3) better connection management / recovery support, 4) client tracking (caching) support (we haven't taken this into use yet, but will).

This is not to rain on anyone's work, but I'm just hoping that it's realized that the current redis client has downsides, and people might be hoping for a better redis client coming from this repo.

@nakedible-p Hi Nuutti, I guess you evaluated vs. redis-rs.

We do plan to expose Rust client at some point, but when will do, it will be after we will have major differences from the upstream — redis-rs.

Redis-rs is robust and have a wide support to all kind of rust users and methodologies, which has pros and cons. I encourage you to open an issue there and give them your feedback, I know @nihohit, one of the main maintainer and I know he would like to hear what you have to say.

For example — they do support client side caching, and their connection management is very stable. The design of using multiplexer instead of connection pool is after deep evaluation. And the decision behind it is that connection pool is error prune and expose you connection storm easily which will block your app completely, and It's very hard to recover from it. Many of the decision we took and upstream, and the decisions been made by @nihohit in redis-rs is based on data gathered from hundreds and thousands elasticache users, while dealing with their pain with other clients. For example, we made a strong effort on better topology change detection, since it's something that repeat on itself as a big problem in most of the client. In the last two weeks I'd spent long time helping customers of the two main Java clients and the main nodejs client to deal with the topology changes issue, while with glide it just worked smoothly. All of them will move to glide. So as said, having a chat with @nihohit can give you some insights on what we saw as important throw the years, may there are something you missed.

It is not something we formulated and not a team decision, just my assumption: When we will expose a Rust client, it will much thinner. We support async only in all our clients and there are no plans to change it in the near future, we probably end up striping the fork from what is non-relevant to our client.

We will focus on one methodology, stability and reliability, features that we know users are value the most and feature request, performance, external tooling integration and cloud vendor support.

There's also the fact that the Rust logic is the one to serve all the other client we offer. Making it very verbose and robust cost for all the other clients, and cost developing time, which affect all the clients together. So my assumption is that when we would expose the Rust client it will serve users that need a client similar to what we need (the list I mentioned) but will offer fewer methodologies support. For the user need the other, there's redis-rs which is a very good client and I don't see a reason to try to replace it.

But as I said before, my statement is not representing some team decision, or something we discuss seriously, it's just the idea that popped a couple of times, and we will decently still need to discuss and take a decision.

Please share how did you evaluate and what are the stuff were important to you while doing so.

Thank you for answering in such detail! A lot of this is a bit off-topic for this issue, but I'll just put this here so others can see it as well if it ends up being relevant.

First off, on NodeJS, we are currently struggling with topology change detection on node-redis (against MemoryDB). We are currently evaluating switching to ioredis in the meantime to see if that would have better implementation out of the box – but do you consider that we could already use glide nodejs support in production? We are no stranger to being first adopters and using something experimental, but we obviously don't want to use something that's inherently not meant for production yet.

On the connection pool – it's been hard to find proper guidance on what is the best practice with a highly available solution. Our rationale for picking a small pool (4) for connections:

TCP connections sometimes stall if there's for example a few dropped packets in a row. Multiple connections means it is not likely that all of them stall at the same time.
Our client process is multithreaded and running on multiple CPUs, so round-robining between multiple connections probably means less lock contention. Fred uses just atomic integer for round-robin and otherwise the connections are fully independent.
If there ever is a large request or response from Redis, if there's only a single connection, all the other commands are blocked behind that. Now there's at least some that go through, and if the pool would be least-outstanding-requests, then we probably wouldn't even notice.
If a TCP connection does get killed for whatever, if we have a pool of connections, then we already have a ready connection that has done TLS handshake and authentication. Especially relevant since authorization might require refresh of credentials from AWS when it happens, which might stall for a while.
If there is any kind of failover event, it is common that connections just blackhole (not getting any response) until suddenly they start succeeding. We don't want to set our connection timeout to very low values as that may cause problems as well, but we also don't want to wait for the full connection timeout between every retry. Hence, 10 second connection timeout coupled with 4 connections will average around 2.5 seconds between connection attempts (with built-in jitter), meaning we get 2.5 second latency to recover after service is up, but don't get stuck if service is slow to negotiate auth or something. We don't need a pool for this specifically, but it's the easiest way to achieve the effect.

However, we really just want a setup that is as simple as possible, but recovers from any kind of issue as fast as possible, making no compromises there. If, through experience, that is shown to be a single connection that is just re-established as fast as possible in case the server fails to communicate for 4 seconds (for example), then this is exactly the kind of feedback we'd love to have.

As for redis-rs:

I didn't see "tracking" or "caching" mentioned in any of the features, and I saw a bunch of open pull requests, so I assumed it's not done. Glad to hear it's there.
Having to enable "aio", "tokio", "cluster", "connection-manager" and "r2d2" for pooling (which requires an external library which I haven't used) seems a bit much and makes me unsure if everything will work perfectly together.
I want absolutely nothing to block (can I somehow disable blocking commands altogether?). I'm scared that even with all the aio features, some cluster data refresh is going to block. Clients that are not built async from the ground up often may think it's okay to block sometimes.
I want to use a cluster connection with tokio async utilizing a connection manager. What do I even use? ConnectionManager seems to take in a Client and not a ClusterClient. Maybe I'm using it wrong, but since there isn't even a single example on how to use the ConnectionManager
There's no examples beyond the absolutely basic usage, which mentions nothing about clustering, reconnect timeouts, etc. It does not inspire faith in that those features are the well trodden path.
No hooks / events of any kinds. I need to be able to log, in my terms, when a connection unexpectedly dies, when it gets reconnected, etc. even if it happens on the background. Also, if there's a connection manager that reconnects in the background, it would be good to be able to be able to hook into it to be able to send commands on each connection after reconnection.
No automatic retry for in-flight commands. Sure, retrying commands that change data (in a non idempotent way) is a danger that should be opt-in. Yet, for example for read commands, having an automatic 2 retries conveniently hides a single hiccup from the user. Can be written on top of redis-rs, but again, more manual work.
Small tell-tale signs of fred being engineered for high performance, such as mentioning zero-copy frame parsing and experimental glommio support. However, this is just a feeling based thing, didn't benchmark anything or look at such. This usually means the writer is competent otherwise as well.
Redis crate has separate support to enable TCP keepalive. The default options in an OS for keepalive is first keepalive after 2 hours if the connection has been quiet, and then 9 probes 75 seconds apart. That's totally useless for something like this, as TCP keepalive isn't meant for rapid dead connection detection. And even if TCP keepalive is successful, it tells nothing whether that goes all the way to a working Redis. Having this casts doubt on the rest of the code as well. Idle connection PING interval would be something that would be a good feature instead.
No support for "credential provider" to automatically provide new credentials on reconnections. Granted, neither did fred, but they accepted a patch really rapidly, even including periodic reauth support needed for MemoryDB.
0.x version number. Yeah, not a biggie, but still.

But, these are superficial findings – I did not try to make a working client with redis-rs and test all the different failure scenarios. Maybe at some point I will.

And again, to be clear, I'm not complaining! I just try to provide honest feedback. If ninohit doesn't pop by to say "noted", I will probably post these as an issue in the redis-rs repo as well, just to have them recorded.

Finally, I must applaud your goals. Right now it seems hard to find a battle-proven redis client that would be simple to set up and use. Figuring all this out is just time away from the real work of using redis as a solution. So this is exactly what redis/valkey/elasticache/memorydb need to be a more out-of-the-box solution.

Hi, a redis-rs maintainer here. NOTED! :)

Before I start, I want to say that I fred looks like an excellent client, and you should probably be fine with using it. I checked out several things in its source code, and the what I would call the "strategic" difference in its design from redis-rs is that fred uses very high level abstractions, such as multilple multicast channels, in order to provide its logic*. This allows the library to iterate fast and provide a lot of features, but it means that performance-wise, when I benchmarked the 2 clients, redis-rs was measurably faster. That was a while ago - I assume that the numbers changed on both clients. If perf is important, measure for your specific scenario. IMO that means that if fred provides the operations-per-second you need, you can probably be happy with your choice, but if it doesn't, redis-rs still might. The other consideration is that redis-rs is more popular, so you can assume that bugs will be caught and fixed faster. BTW, @avifenesh , you guys are missing out on a couple of bug fixes & features by not pulling from us :)

*take this with a grain of salt - I never went very in-depth into fred's code.

I didn't see "tracking" or "caching" mentioned in any of the features, and I saw a bunch of open pull requests, so I assumed it's not done. Glad to hear it's there.

It's not there, it's in an open PR :)

I want absolutely nothing to block (can I somehow disable blocking commands altogether?). I'm scared that even with all the aio features, some cluster data refresh is going to block. Clients that are not built async from the ground up often may think it's okay to block sometimes.

I'm sorry, but I really don't understand this logic. If you want to make sure there's nothing blocking, read the code. (or run tokio with the relevant checks). The chance that a person introduces blocking calls by mistake is not based on whether a library is async or not, but by the experience and care that the maintainers & community put it. redis-rs is has twice the downloads of fred, but sadly we don't have number on who uses which features. I believe that more than half of our users use the async API, so I believe that there are more people who'd report such a mistake on redis-rs than on fred.

I want to use a cluster connection with tokio async utilizing a connection manager

Why? the cluster connection already reconnects automatically. adding a connection manager on top of this is unnecessary.

I don't want to go into all your other points - some I agree with, others less so. You're more than welcome to open an issue per request on redis-rs, and we'll try to attend to them. But, again - fred looks like a good client. Since you're already using it, I hope that it serves you well :)

@nakedible-p For redis-rs comments, I am happy to see that @nihohit is here, he is really the guy you should talk too. On redis-rs specially but also generally working with Valkey/Redis using Rust. As he said, many times the decision is between simplicity and abstraction to performance. If you want to keep the conversation here so others can see, do so, just mention it in an issue in redis-rs so also there it'll be available for everyone.

And for Glide nodejs - we are going for a first stable version very soon, and if you are going for POC anyway with another client, especially if the cause is topology change detection issues, im encouraging you to take Glide for a ride and a POC. I'd love to take part in it and accompany you in the process.

I did it with our node beta testers, two major companies, and any feature request was deployed in two weeks after the request (supporting MUSL for example, or Emma plus commonjs).

As for topology changes detection. I spent the whole day debugging the codebase of the library mentioned for a very large user having issues with topology changes detection. The issue blocked his ability to serve requests for 7 ( ! ) minutes. It's a huge gaming company so you can imagine how problematic 7 minute blocks are. Also with memorydb.

I haven't run it against Glide yet, but i already know that it can't happen with Glide, and the second i find how to solve it for them (they might need to clone the repo and patch) I'm going to run this test with Glide and would love to share the results.

Let me know if you are going with us.

You can approach me personally in our discord - https://discord.gg/Wt7n7NzfcZ

@nihohit Right after the release we will scan the whole redis-rs repo and merge!

Thank you for the swift reply! And yeah, not all points are possibly "fair", as these are just impressions, so take them as you will.

I want to use a cluster connection with tokio async utilizing a connection manager

Why? the cluster connection already reconnects automatically. adding a connection manager on top of this is unnecessary.

See, that's the part that I missed – I did find it in the documentation eventually, though.

All in all, I'm still a bit unsure about redis crate usage to get something that reliability with all the reconnections.

If I'm not mistaken, after all this reading, it might be at process start, I would do:

let connection = ClusterClient::builder(vec!["redis://myhost/"])
    .retries(4)
    .max_retry_wait(1000)
    .min_retry_wait(1000)
    .connection_timeout(Duration::from_secs(10))
    .response_timeout(Duration::from_secs(4))
    .build()?
    .get_async_connection().await?;

And then stash that in a static global. (Timeout values etc. just in the ballpark, not well thought out.)

And then on every usage site I would need to have AsyncCommands trait imported and I'd do:

let _: () = CONNECTION.clone().set("foo", "bar").await?;

And that would give me a connection that:

Always recovers in less than a minute, regardless of the issues with the servers
Automatically recovers from any cluster reconfiguration, re-resolving the DNS name on every retry
Times out any command at 4 seconds by killing the current connection, but re-establishes the connection in the background
Queues new commands if current connection is not healthy until reconnect is finished (or timeout)

If that's true, then ultimately there isn't much difference in the approaches. It was just hard for me to figure all this out, and the mutable connection object kind of threw me off as I need a single shared connection/client that's valid for all threads and lives for the lifetime of the application.

and the mutable connection object kind of threw me off

sigh You're not the first person to point that out, and I don't like that design choice either. It was made before I joined the team 🤷 . The only thing I can point out is that if we ever breack our API, that's definitely on the list :)https://github.com/redis-rs/redis-rs/issues/1043

@avifenesh

You can approach me personally in our discord - https://www.linkedin.com/in/ACoAACz2yQ4BF4xYt14QGsx4IF9kSzT10S5gXX0

That's a link to your linkedin profile, not to a discord server

valkey-io / valkey-glide

Rust client #828