Cannot extend an already-expired lock crashes consumer pods.

doramar97 commented 1 year ago

We are using 1 Node of Elasticache for Redis on our Production environment. Engine version - 6.2.6, Node type - cache.t3.micro.

The environment is implemented on EKS, consumers are pods on the cluster and each pod handles one task at a time. (We also use AmazonMQ - Rabbit for handling tasks).

We are using redlock to lock a process if another process is already running using the customer - which means as long as a task running on the specific customer is executing, no other task regarding the specific customer can be executed and goes to a another queue that handles delayed messages .

Our issue is with long running tasks or multiple tasks addressing the same customer. Getting the following errors which causes the pod to restart.

[redlock] error while executing lock block function. Cannot extend an already-expired lock. 
[redlock] error while executing lock block function.

redlock = new Redlock([redisClient], {
    // The expected clock drift; for more details see:
    // http://redis.io/topics/distlock
    driftFactor: 0.01, // multiplied by lock ttl to determine drift time

    // The max number of times Redlock will attempt to lock a resource
    // before erroring.
    retryCount: 10,

    // the time in ms between attempts
    retryDelay: 500, // time in ms

    // the max time in ms randomly added to retries
    // to improve performance under high contention
    // see https://www.awsarchitectureblog.com/2015/03/backoff.html
    retryJitter: 200, // time in ms

    // The minimum remaining time on a lock before an extension is automatically
    // attempted with the `using` API.
    automaticExtensionThreshold: 500, // time in ms
  });

Will be happy to provide some more context or code, we are also setting lockDuration: number = 2000 in a function that checks if a block is locked.

Will be happy to get any kind of help and guidance towards this issue, or the best practices to our use case, Thanks !

raimoa1 commented 7 months ago

We found that this code https://github.com/mike-marcacci/node-redlock/blob/main/src/index.ts#L436 while loop creates an infinite loop and crashes the server. We found that there is a mem leak. Interestingly it’s caused when there are parallel requests going on. Somehow the redlock can't cope with that and gets stuck in the infinite loop.

bwright2810 commented 1 month ago

This issue appears to be occurring to us as well in our AWS Lambda executions. We had an API request that was taking longer than expected during peak times and as a result it was out lasting the specified redlock lock time. When we went to extend the lock after the API request, the Lambda mysteriously imploded with an "UnknownApplicationError" that was not getting caught in our error handling block. It looks like this issue is what was happening.

mike-marcacci / node-redlock

Cannot extend an already-expired lock crashes consumer pods. #283