Only check latencies once every 10 seconds with `routeByLatency`

justinmir commented 11 months ago

routeByLatency currently checks latencies any time a server returns a MOVED or READONLY reply. When a shard is down, the ClusterClient chooses to issue the request to a random server, which returns a MOVED reply. This causes a state refresh and a latency update on all servers. This can lead to significant ping load to clusters with a large number of clients.

This introduces logic to ping only once every 10 seconds, only performing a latency update on a node during the GC function if the latency was set later than 10 seconds ago.

Fixes https://github.com/redis/go-redis/issues/2782

Figure: Ping behavior of the client running 21bd40a47e56e61c0598ea1bdf8e02e67d1aa651 and a client running this PR. When shards are failed the current cluster client will spam pings while the fixed cluster client will only ping each server once every 10 seconds.

This shows the impact in a running large production cluster. The cluster is handling ~4M pings per second due to this behavior.

ofekshenawa commented 8 months ago

LGTM! WDYT about changing Unix() to NanoUnix? To be more precise and to avoid unnecessary loops

justinmir commented 8 months ago

Sure I'll push that change shortly.

justinmir commented 6 months ago

@ofekshenawa PTAL when you get a chance!

justinmir commented 1 week ago

@vladvildanov hoping to get some eyes here, this will help us no longer have to maintain our own fork

redis / go-redis

Only check latencies once every 10 seconds with `routeByLatency` #2795