Open dayonekoo opened 7 months ago
Sometimes step 2 fails to happen for more than 30 seconds which results in another thread acquiring the same lock
What is the reason? Do you see errors in log?
As alternative you can use tryLock() with leaseTime parameter.
@mrniko I am from the same team as @dayonekoo.
What is the reason? Do you see errors in log?
As @dayonekoo pointed out in https://github.com/redisson/redisson/issues/5697#issuecomment-2010889454, we are not able to see why the lock is released early, but we suspect that those early return conditions (or some other conditions within that block) are being evaluated causing renewExpirationAsync()
to not trigger. But we can't tell for sure due to lack of these logs. Adding these trace logs would help us determine the root cause of the issue.
We notice in the implementation of RedissonBaseLock.renewExpiration() that we are returning early under certain circumstances (here & here) and we suspect that this may be relevant to our current situation of non-renewals. Is it possible for us to add trace level logs here for us to get further visibility on renew expiration behavior? As of now it's difficult to determine where exactly the renew expiration flow is going wrong.
As alternative you can use tryLock() with leaseTime parameter.
That would extend the time in a failure situation for the lock to be acquired by another process or thread. Our understanding is that without a lease time parameter, scheduleExpirationRenewal()
is called which renews the lock expiration every 10s by default (30s / 3). So currently, in a failure situation, we have up to 30s to wait for a new process/thread to pick up the lock, whereas if we used a lease time we would have to set it to 60s to match our process that we hold the lock for, which extends this failover time. Is that correct?
Adding trace logs to renewExpiration()
, scheduleExpirationRenewal()
, and cancelExpirationRenewal()
would help to allow us to debug what is going wrong.
Which Redisson is it? There is no 6.27.1 you defined
@mrniko We are using 3.27.2. 6.27.1 must have been a typo.
Expected behavior
Expected sequence :
Actual behavior
Sometimes step 2 fails to happen for more than 30 seconds which results in another thread acquiring the same lock. Renew expiration doesn't happen for over 30 seconds which releases the lock.
Steps to reproduce or test case
Three separate pods try to acquire the same RLock every minute in the following way:
When failure happens, multiple threads end up subscribing to the same channel because channel lock is released before 5 minutes is up.
Redis version
Elasticache engine version 7.1.0
Redisson version
6.27.1
Redisson configuration