redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.3k stars 947 forks source link

Adjusting disconnectedBehavior Option to Prevent Timeout When Redis Shutdown #2866

Open MagicalLas opened 1 month ago

MagicalLas commented 1 month ago

Feature Request

Is your feature request related to a problem? Please describe

Sometimes need to shut down Redis for maintenance, such as version upgrades. When shutdown or kill redis-server, connection is disconnected immediatly. But some requests do not fail immediately but instead experience timeouts, increasing application latency.

Currently, our Redis client options are configured as follows:

ClientOptions options = ClientOptions.builder()
  .autoReconnect(true)
  .disconnectedBehavior(DisconnectedBehavior.REJECT_COMMANDS)
// ...

The DisconnectedBehavior.REJECT_COMMANDS option appears to cancel commands when the connection is lost. However, if autoReconnect is not set to false, commands in the CommandHandler.stack are not canceled but are placed into the disconnectedBuffer. Therefore, ongoing commands are not rejected if autoReconnect is true, even with the client option modified.

Describe the solution you'd like

Condition for canceling commands in the CommandHandler.stack should be based solely on rejectCommandsWhileDisconnected and not combined with autoReconnect.

(sample implementation)

Describe alternatives you've considered

  1. make other client options

adding a little more complexity, which can be beneficial for migrations such as versioning because they don't change existing behavior

  1. just disable reconnect option.

Adjusting the autoReconnect option can solve this issue immediately, but it would require implement custom reconnect connection. I want to avoid writing custom code for reconnections by using the auto-reconnect feature.

Teachability, Documentation, Adoption, Migration Strategy

Maybe we need to update docs for disconnectedBehavior option.

tishun commented 3 weeks ago

Related to #547 and specifically https://github.com/redis/lettuce/commit/dde37483a31215ebf17f7141fea6454971ea9a08

tishun commented 3 days ago

@MagicalLas ,

I've been thinking about this and I am not sure this is the right way to go. I think there are two cases here that can't be served both:

  1. On one hand we can have a disconnect caused by network issues. In this case, with auto-reconnect enabled and reliability set to anything else than AT_MOST_ONCE I think the expectation would be to NOT reject the commands, but instead attempt to execute them again when you reconnect
  2. On the other hand in a maintenance case (such as yours) obviously the driver could not do that before the instance is up, which would lead to timeouts (similar to what you experience)

The problem is that the driver has no way to distinguish between these two different scenarios, without implementing some additional logic to notify it upon shutdown (e.g. push a pubsub message to a control channel that would let the driver disable auto-reconnect). As far as I know the Redis server does not currently send any notification when it is being gracefully shut down.

Without the above custom logic any configuration change you make could potentially compromise the first scenario (reducing timeout value, changing reliability setting, disable auto-reconnect).

MagicalLas commented 4 hours ago

I agree with your view that if the reliability setting is AT_LEAST_ONE and autoReconnect is true, the expected behavior would be to retry the requests. However, we already specify options such as disconnectedBehavior (DisconnectedBehavior.REJECT_COMMANDS). Even with a reliability setting of AT_LEAST_ONE, it doesn't seem strange to me that commands queued during a disconnect are rejected according to DisconnectedBehavior.

When DisconnectedBehavior.REJECT_COMMANDS is set with a reliability level of AT_LEAST_ONE, new commands are rejected. How about applying the same logic to commands that are already in the queue?

So, I feel that it would not be a major issue if the settings for DisconnectedBehavior take precedence. Please let me know your thoughts.