redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.3k stars 947 forks source link

Does Lettuce use CLUSTER NODES instead of CLUSTER SLOTS to discover topology? #2898

Open markobip opened 2 weeks ago

markobip commented 2 weeks ago

Bug Report

In DefaultClusterTopologyRefresh, it seems to me that the topology is discovered using CLUSTER NODES.

If I understand correctly, clients should use CLUSTER SLOTS to discover topology.

Current Behavior

We sometimes get a LOADING Redis is loading the dataset in memory error when an instance comes online in the cluster.
I wasn't able to reproduce this reliably yet.

Input Code

We connect like this:

Input Code ```java RedisURI.Builder builder = RedisURI.Builder.sentinel("one") .withSentinel("two") .withSentinel("three"); redisUri = builder.get() .withSentinelMasterId(redisProperties.getSentinelMaster()) .withAuthentication( redisProperties.getUsername(), redisProperties.getPassword().toCharArray() ) .build(); client = RedisClient.create(); connection = MasterReplica.connect(client, codec, redisUri); connection.setReadFrom(ReadFrom.LOWEST_LATENCY); syncCommands = connection.sync(); ```

Expected behavior/code

I expect Lettuce to never connect to a Redis instance that is still syncing.

Environment

Possible Solution

Use CLUSTER SLOTS instead of CLUSTER NODES when discovering topology.

Additional context

I'm not entirely sure I understand all the moving parts here, so I'm sorry if I am wrong here. Thanks anyway! :D

tishun commented 1 week ago

Seems like we are indeed using CLUSTER NODES, which currently is against the recommendation in the docs:

Note that normally clients willing to fetch the map between Cluster hash slots and node addresses should use CLUSTER SLOTS instead. CLUSTER NODES, that provides more information, should be used for administrative tasks, debugging, and configuration inspections. It is also used by redis-cli in order to manage a cluster.

This was initially implemented a while ago in #240 (Lettuce 4.2.0) and seems to have never been addressed since.

The change does not require new design, but has impact on large portions of the code.

Adding to the backlog until we have spare resources.

markobip commented 1 week ago

I've been poking around a bit, it doesn't seem that risky, at least at first glance. What do you think about a new contributor (me) trying to tackle this?

tishun commented 1 week ago

I've been poking around a bit, it doesn't seem that risky, at least at first glance. What do you think about a new contributor (me) trying to tackle this?

Absolutely, feel free to submit a PR, you are more than welcome.