redis-rb / redis-client

Simple low level client for Redis 6+
MIT License
124 stars 60 forks source link

Ensure BLPOP/BRPOP returns nil instead of raising ReadTimeoutError #197

Closed stanhu closed 4 months ago

stanhu commented 5 months ago

In https://github.com/redis/redis-rb/issues/1279, we discovered that when using BLPOP or BRPOP redis-rb properly returned nil when timeout was reached with no key present, but when connecting to Redis Sentinels, the client raised a ReadTimeoutTimeout error.

This occurred because of a subtle difference in how Redis::Client (from redis-rb) and RedisClient (from redis-client) behaved. The former, which is used with standalone Redis, returned nil because the socket read timeout was incremented to the command timeout value (https://github.com/redis/redis-rb/pull/1175). The latter did not have this, so the socket read timeout would get triggered before the actual Redis timeout hit.

To make the behavior consistent, increment the configured read timeout to the command timeout.

Closes https://github.com/redis/redis-rb/issues/1279

stanhu commented 5 months ago

Hmm, seeing a few flaky test runs: https://github.com/redis-rb/redis-client/actions/runs/9099843975/job/25013339465?pr=197

I can't reproduce this locally yet, but I wonder if there's a race condition here where the socket read timeout hits before the Redis command timeout can respond. Maybe the 0.1 s read timeout is too small?

UPDATE: Bumping the socket timeout to 0.5 s seems to solve the issue.

stanhu commented 4 months ago

@byroot Thanks for the merge! We've already been running with this patch in production for the past week and verified it resolves another customer issue. Given this appears to affect a number of Sidekiq users in https://github.com/sidekiq/sidekiq/issues/6162, would you mind releasing a new version so everyone can benefit? Thanks!

byroot commented 4 months ago

0.22.2 is out. I doubt that many people on that issue use sentinel, but let's see.

mperham commented 4 months ago

I think they use Redis SaaSes that might use Sentinel internally for failover.

byroot commented 4 months ago

If so that wouldn't impact them. What matters here is if you instantiate RedisClient in sentinel mode: https://github.com/redis-rb/redis-client?tab=readme-ov-file#sentinel-support

If it's hidden behind some proxy, redis-client would have behaved just like a standalone server.

mperham commented 4 months ago

Ah, right. I'm still getting reports almost daily of people experiencing frequent ReadTimeoutErrors with Sidekiq. Would you be willing to do a code read of redis-client with me, so we can walk thru the codebase and think about possible timeout causes or issues? I don't know what else to do.

I'm available at my Office Hour Thursdays 9am Pacific (https://sidekiq.org/support.html) or we can schedule something.

stanhu commented 4 months ago

@byroot @mperham I think this pull request would help with Sidekiq 7 based on the backtrace in https://github.com/sidekiq/sidekiq/issues/6162#issuecomment-2095286515. You can see that redis-client is used directly and redis-rb, which has https://github.com/redis/redis-rb/pull/1175, is not in the loop at all.

If Redis::Client from redis-rb were used, then what you said about Redis standalone vs. Sentinel is true. However, it looks like Sidekiq instantiates RedisClient from redis-client directly: https://github.com/mperham/sidekiq/blob/b89bdb45c701e442b94d28e8ef3a8025fafb5fb4/lib/sidekiq/redis_client_adapter.rb#L63-L74.

stanhu commented 4 months ago

Actually, it seems that Sidekiq already increments the read timeout here:

https://github.com/sidekiq/sidekiq/blob/e798c23643cd7d3f3b229016a87fec2bfeb7905a/lib/sidekiq/fetch.rb#L47

So maybe that can be dropped now?