Closed stanhu closed 4 months ago
Hmm, seeing a few flaky test runs: https://github.com/redis-rb/redis-client/actions/runs/9099843975/job/25013339465?pr=197
I can't reproduce this locally yet, but I wonder if there's a race condition here where the socket read timeout hits before the Redis command timeout can respond. Maybe the 0.1 s read timeout is too small?
UPDATE: Bumping the socket timeout to 0.5 s seems to solve the issue.
@byroot Thanks for the merge! We've already been running with this patch in production for the past week and verified it resolves another customer issue. Given this appears to affect a number of Sidekiq users in https://github.com/sidekiq/sidekiq/issues/6162, would you mind releasing a new version so everyone can benefit? Thanks!
0.22.2
is out. I doubt that many people on that issue use sentinel, but let's see.
I think they use Redis SaaSes that might use Sentinel internally for failover.
If so that wouldn't impact them. What matters here is if you instantiate RedisClient in sentinel mode: https://github.com/redis-rb/redis-client?tab=readme-ov-file#sentinel-support
If it's hidden behind some proxy, redis-client
would have behaved just like a standalone server.
Ah, right. I'm still getting reports almost daily of people experiencing frequent ReadTimeoutErrors with Sidekiq. Would you be willing to do a code read of redis-client with me, so we can walk thru the codebase and think about possible timeout causes or issues? I don't know what else to do.
I'm available at my Office Hour Thursdays 9am Pacific (https://sidekiq.org/support.html) or we can schedule something.
@byroot @mperham I think this pull request would help with Sidekiq 7 based on the backtrace in https://github.com/sidekiq/sidekiq/issues/6162#issuecomment-2095286515. You can see that redis-client
is used directly and redis-rb
, which has https://github.com/redis/redis-rb/pull/1175, is not in the loop at all.
If Redis::Client
from redis-rb
were used, then what you said about Redis standalone vs. Sentinel is true. However, it looks like Sidekiq instantiates RedisClient
from redis-client
directly: https://github.com/mperham/sidekiq/blob/b89bdb45c701e442b94d28e8ef3a8025fafb5fb4/lib/sidekiq/redis_client_adapter.rb#L63-L74.
Actually, it seems that Sidekiq already increments the read timeout here:
So maybe that can be dropped now?
In https://github.com/redis/redis-rb/issues/1279, we discovered that when using BLPOP or BRPOP
redis-rb
properly returnednil
whentimeout
was reached with no key present, but when connecting to Redis Sentinels, the client raised aReadTimeoutTimeout
error.This occurred because of a subtle difference in how
Redis::Client
(fromredis-rb
) andRedisClient
(fromredis-client
) behaved. The former, which is used with standalone Redis, returnednil
because the socket read timeout was incremented to the command timeout value (https://github.com/redis/redis-rb/pull/1175). The latter did not have this, so the socket read timeout would get triggered before the actual Redis timeout hit.To make the behavior consistent, increment the configured read timeout to the command timeout.
Closes https://github.com/redis/redis-rb/issues/1279