redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.3k stars 947 forks source link

Lettuce client reconnecting frequently #2848

Closed sksamuel closed 1 month ago

sksamuel commented 1 month ago

Current Behavior

We are seeing that AWS elasticache redis is reporting connections are being made constantly.

image

The slight peaks in that chart are the periodic refreshes we have enabled at 30 minute intervals.

Input Code

Here is the basic setup code.

   val clientResources = DefaultClientResources.builder()
      .commandLatencyRecorder(DefaultCommandLatencyCollector.disabled())
      .reconnectDelay(
         Delay.fullJitter(
            /* lower = */ Duration.ofMillis(100),     // minimum 100 millisecond delay
            /* upper = */ Duration.ofSeconds(10),      // maximum 10 second delay
            /* base = */ 100, /* targetTimeUnit = */ TimeUnit.MILLISECONDS // 100 millisecond base
         ),
      ).build()

   val topologyRefreshOptions = ClusterTopologyRefreshOptions.builder()
      .closeStaleConnections(true)
      .enableAllAdaptiveRefreshTriggers()
      .dynamicRefreshSources(true)
      .enablePeriodicRefresh(30.minutes.toJavaDuration())
      .build()

   val clusterClient = RedisClusterClient.create(clientResources, uri).apply {
      setOptions(
         ClusterClientOptions
            .builder()
            .topologyRefreshOptions(topologyRefreshOptions)
            .maxRedirects(10)
            .pingBeforeActivateConnection(false)
            .validateClusterNodeMembership(false)
            .build()
      )
   }

And then we simply do client.connect(codec) And .async() each time we need the connection

The last part .async() is called repeatedly, but the docs say this does not cause a new connection.

The AWS metrics for new connections should not be reporting constant new connections.

Environment

Additional context

In order to check what the AWS metrics are showing correctly, I've tried various combinations of validate node membership, ping before activate connection. Changing periodic refresh to say 30 seconds, increases the number of new connections massively. But even putting this at 8 hours or off completely does not eridicate this base line of new connections.

We also have around 25 redis instances, of varying master and replica counts, and this happens on all. All our caches are in cluster mode.

sksamuel commented 1 month ago

I've figured it out. Changing the periodic refresh was the fix, we still had the old value of 30 seconds in another service.