redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.41k stars 975 forks source link

`Cluster` command spike on topology refresh leading to higher latencies. #2468

Open kushwaha0791 opened 1 year ago

kushwaha0791 commented 1 year ago

Current Behavior

I am seeing cluster command spikes whenever a redis node gets replaced. These spikes make all other client calls slow leading to higher latencies. I have a service with 900 instances connecting to a redis cluster of size 300(150 primaries). I have tried multiple settings like enabling only adaptive refresh(dynamic and periodic disabled) and other combinations but it doesn't help. Looking through the lettuce code, it seems lettuce calls into 300 redis nodes(primaries and replica) and picks the topology based on which topology view knows about the largest number of existing nodes. I cannot tell if the adaptive trigger timeout works.

Stack trace ```java // your stack trace here; ```

Input Code

Input Code ```java redisClient.setOptions( ClusterClientOptions.builder() .autoReconnect(true) .requestQueueSize(REQUEST_QUEUE_SIZE) .cancelCommandsOnReconnectFailure(true) .disconnectedBehavior(ClientOptions.DisconnectedBehavior.REJECT_COMMANDS) .topologyRefreshOptions( ClusterTopologyRefreshOptions.builder() .enablePeriodicRefresh(false) .enableAllAdaptiveRefreshTriggers() .dynamicRefreshSources(false) .build()) .timeoutOptions( TimeoutOptions.builder() .timeoutCommands(true) .fixedTimeout(Duration.ofMillis(COMMAND_TIMEOUT_MS)) // command timeout .build()) .build()); ```

Environment

Additional context

Screenshot 2023-07-26 at 5 12 54 PM

kushwaha0791 commented 1 year ago

@mp911de any suggestions?

1209233066 commented 1 year ago

I also encountered it and the OS indicated a flood attack。 redis version:5.0.14 and lettuce version 6.0.1

mp911de commented 1 year ago

With dynamicRefreshSources being disabled, Lettuce uses only the provided seed nodes provided in RedisClusterClient.create(…) instead of reaching out to all cluster nodes. These spikes indicate that some event has caused increased topology refreshes.

This view here is pretty high-level, you'd need to investigate on a spike, what has happened, ideally by capturing a debug log from one of the nodes.