redis / rueidis

A fast Golang Redis client that supports Client Side Caching, Auto Pipelining, Generics OM, RedisJSON, RedisBloom, RediSearch, etc.
Apache License 2.0
2.39k stars 150 forks source link

Feature: retry backoffs #633

Open rueian opened 2 weeks ago

rueian commented 2 weeks ago

There are about 25 goto retry statements in rueidis. Such as:

https://github.com/redis/rueidis/blob/5135acdc65c815cdce8e8b0c12c1e5aa939504bd/cluster.go#L460-L462 https://github.com/redis/rueidis/blob/5135acdc65c815cdce8e8b0c12c1e5aa939504bd/client.go#L48-L50 https://github.com/redis/rueidis/blob/5135acdc65c815cdce8e8b0c12c1e5aa939504bd/sentinel.go#L66-L68

And so on.

It will be better to add backoffs before ALL these statements to avoid rapid retry storms and those backoffs should have the following properties:

  1. Should be capped for at most 1 second.
  2. Should not exceed the context deadline.
  3. Backoff exponentially with jitter.

Related to https://github.com/valkey-io/valkey-doc/pull/164

proost commented 6 days ago

@rueian It's good idea.

Maybe later someone want own retry logic. So default retry logic make as you said, but how about make a retrier function like this?

// Retrier is a function that will be called when a command is retryable.
// The function should return an error if the command should not be retried.
type Retrier func(ctx context.Context, attempts int) error 
rueian commented 6 days ago

@proost sounds good, but the returned 'error' makes the Retrier confusing in terms of its responsibility.

How about just return a time.Duration and name the function RetryDelay?

proost commented 5 days ago

@rueian Ok, I understand what's your intention. So Let's clear interface.

How do you think?

// RetryDelay returns the delay that should be used before retrying the
// attempt. Will return error if the delay could not be determined or do not retry.
type RetryDelay func(attempts int, err error) (time.Duration, error)

If you agree with definition, I will make a feature.

rueian commented 5 days ago

Returning an error still seems confusing to me. We can use negative duration to stop retrying. I think returning a duration only is much more clear.

One more thing. If the return duration exceeds the context deadline, we should not retry. So there might be some duplications of checking the context deadline in the old isRetryable function and the new RetryDelay that we had better keep in one place.

proost commented 5 days ago

@rueian how about this?

// RetryDelay returns the delay that should be used before retrying the
// attempt. Will return negative delay if the delay could not be determined or do not retry.
type RetryDelay func(attempts int, err error) time.Duration

One more thing. If the return duration exceeds the context deadline, we should not retry. So there might be some duplications of checking the context deadline in the old isRetryable function and the new RetryDelay that we had better keep in one place.

I considered it already. so I plan like this:

  1. call old isRetryable function
  2. if retryable, call RetryDelay
  3. wait until delay time or context is done.
rueian commented 5 days ago

Yes, looks good to me.