redisson / redisson

Redisson - Easy Redis Java client and Real-Time Data Platform. Valkey compatible. Sync/Async/RxJava/Reactive API. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, RPC, local cache ...
https://redisson.pro
Apache License 2.0
23.19k stars 5.33k forks source link

Redis Sentintel mode: PubSub stops working after Redis restart #6026

Closed olessio closed 2 weeks ago

olessio commented 2 months ago

Redis version

6.2.13

Redisson version

3.32.0, 3.27.2

Redisson configuration

SentinelServersConfig

What is the Expected behavior?

After Redis is up and Redisson has been reconnected, PubSub must keep working.

What is the Actual behavior?

Problem 1 Redisson is failing to create new topic listener

org.redisson.client.RedisTimeoutException: Unable to acquire subscription lock after 7500ms. Try to increase 'subscriptionTimeout', 'subscriptionsPerConnection', 'subscriptionConnectionPoolSize' parameters.
    at org.redisson.pubsub.PublishSubscribeService.lambda$subscribe$17(PublishSubscribeService.java:413)
    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:706)
    at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:694)
    at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:781)
    at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:494)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:1583)

Problem 2 Existing subscriptions are not re-established: Redis command "PUBSUB CHANNELS" returns empty list even though before disconnect there were multiple channels.

Problem 3 (I assume it is a consequence of the 1st Problem): RedissonLock stuck at

    @Override
    public boolean tryLock(long waitTime, long leaseTime, TimeUnit unit) throws InterruptedException {
       ....
        CompletableFuture<RedissonLockEntry> subscribeFuture = subscribe(threadId);
        try {
            subscribeFuture.get(time, TimeUnit.MILLISECONDS);

After service restart, everything works as expected

Additional information

No response

sr11235 commented 1 month ago

I am encountering the same issue when failover testing our HA setup on Redis Cloud with Redisson 3.29.0. Redisson is configured to run in single-server mode, as Redis Enterprise hides the clustered topology of the DBs.

mrniko commented 1 month ago

Should be fixed in https://github.com/redisson/redisson/pull/6047. Can you try 3.34.0 version?

sr11235 commented 1 month ago

Could reproduce the issue with a Redis Sentinel setup with version 3.29.0 but not with 3.34.1. Will try a Redis Cloud failover test soon

sr11235 commented 2 weeks ago

Issue did not re-appear on new failover test with redisson 3.34.1. Seems to be fixed. Thank you!