Expected behavior
When calling RLock tryLock on an occupied lock on a Redis cluster running in clustered mode, it should wait until the lock is released and take it.
Actual behavior
Setup: N application instances concurrently trying to acquire a lock. Redis is running is clustered mode with 2 shards, 3 nodes each (1 master 2 replicas).
What we see is that sometimes when a lock is taken by instance A and an instance B attempts to take it using RLock tryLock, from time to time it times out after a certain period even though the lock is unlocked in the meantime. It is not happening constantly, just occasionally. Almost like the tryLock’s subscription lands on a different shard than the one the lock is located in and never gets the notification of the release. This should not be the case because the unlock messages are supposedly broadcasted to all shards.
The following picture shows one of the observations:
Instance setup means the moment before calling tryLock.
Colors are indicating one locking sequence.
In-between the last green and first red call, there are clear seconds when the lock was free to grab and we are sure it was not taken by someone else. Timeout period is 5 seconds.
Important to mention that we noticed this behaviour after the Spring Boot 3 upgrade on the app and using Redisson 3.22.0.
Example code snippet used for this:
public Lock getLock(String lockId) {
try {
var lock = redissonClient.getLock(lockId);
log.info("Redisson RLock instance setup completed with ids: " + lockId
+ " in thread: " + Thread.currentThread().getName());
var lockAcquired = lock.tryLock(lockWaitTimeoutMs, TimeUnit.MILLISECONDS);
if (lockAcquired) {
log.info("Redisson RLock acquiring succeeded with ids: " + lockId
+ " in thread: " + Thread.currentThread().getName());
return lock;
}
var message = String.format("Could not acquire lock in %sms for lockId", lockWaitTimeoutMs);
log.info(message + lockId);
log.info("Redisson RLock acquiring failed" + lockId
+ " in thread: " + Thread.currentThread().getName());
throw new AcquireLockFailedException(message, List.of(lockId));
} catch (Throwable t) {
if (t instanceof AcquireLockFailedException acquireLockFailedException) {
throw acquireLockFailedException;
}
var message = "Exception thrown while acquiring lock for lockId";
log.error(message + ": " + lockId, t);
throw new AcquireLockFailedException(message, List.of(lockId), t);
}
}
public void releaseLock(String lockId, Lock lock) {
try {
lock.unlock();
log.info("Redisson RLock releasing was successful" + lockId
+ " in thread: " + Thread.currentThread().getName());
} catch (Throwable t) {
log.info("Redisson RLock releasing failed!!!!!!!!" + lockId
+ " in thread: " + Thread.currentThread().getName());
log.error("Failed to release lock for lockId: " + lockId, t);
}
}
Steps to reproduce or test case
Redis with clustered mode enabled using multiple shards.
Calling tryLock on the cluster concurrently from different JVMs with X ms timeout.
Expected behavior When calling RLock
tryLock
on an occupied lock on a Redis cluster running in clustered mode, it should wait until the lock is released and take it.Actual behavior Setup: N application instances concurrently trying to acquire a lock. Redis is running is clustered mode with 2 shards, 3 nodes each (1 master 2 replicas). What we see is that sometimes when a lock is taken by instance A and an instance B attempts to take it using RLock
tryLock
, from time to time it times out after a certain period even though the lock is unlocked in the meantime. It is not happening constantly, just occasionally. Almost like the tryLock’s subscription lands on a different shard than the one the lock is located in and never gets the notification of the release. This should not be the case because the unlock messages are supposedly broadcasted to all shards. The following picture shows one of the observations:Instance setup means the moment before calling
tryLock
. Colors are indicating one locking sequence. In-between the last green and first red call, there are clear seconds when the lock was free to grab and we are sure it was not taken by someone else. Timeout period is 5 seconds. Important to mention that we noticed this behaviour after the Spring Boot 3 upgrade on the app and using Redisson 3.22.0.Example code snippet used for this:
Steps to reproduce or test case Redis with clustered mode enabled using multiple shards. Calling tryLock on the cluster concurrently from different JVMs with X ms timeout.
Redis version 7.0 (AWS ElasticCache)
Redisson version 3.22.0
Redisson configuration RedisClusterConfiguration