microfleet / ioredis-lock

Node distributed locking using redis
MIT License
26 stars 8 forks source link

"Adopt" already held lock based on exact payload match? #16

Open papandreou opened 5 years ago

papandreou commented 5 years ago

Hi!

I'm working on an app where a class of background jobs uses ioredis-lock to control access to a shared resource. There's one particular case where such a job can fail ungracefully due to the worker node being shut down, which means that it doesn't get to release the lock. The next thing that happens is that job queue detects that the job has stalled, and then proceeds to retry it, and so it fails to take the lock because the lock held by the old instance of the same job hasn't expired yet at that point.

I've come up with a hack that sets _key and _locked, then uses Lock#release to release the existing lock. This will only succeed when the job was taken by a previous instance of the same job, as determined by the job id in the payload. It looks roughly like this:

async function takeLock(redisKey, jobId, lockOptions) {
  const lock = ioredisLock.createLock(clientFactory(), lockOptions);
  // Use the job id as the payload of the lock so that we can re-acquire it.
  // Found the trick here:
  // https://github.com/makeomatic/ioredis-lock/issues/3#issuecomment-248290852
  lock._id = jobId;

  try {
    await lock.acquire(key);
  } catch (err) {
    // We failed to take the lock. This might be due to a previous instance of the same
    // job being classified as stalled and then being re-added to the waiting queue.
    // In that case we want to take over the existing lock.
    // Fool ioredis-lock into thinking that it's currently holding the lock, so we
    // can use the release method to break it iff the job id matches that of the
    // current lock payload:
    lock._locked = true;
    lock._key = redisKey;
    try {
      await lock.release();
    } catch (err) {
      // Failed to release the existing lock. This is most likely due to the payload not
      // matching, so we aren't battling a stalled job. Give up.
      return false;
    }
    try {
      await lock.acquire(redisKey);
      // Successfully re-acquired the lock from the stale job
    } catch (err) {
      // Failed to re-acquire the lock, assume that another job beat us to it and give up:
      return false;
    }
  }
  return lock; // Successfully took the lock
}

This should be free of race conditions, but I don't feel great about poking around in the internals like this. Would you consider "adoption" of an existing lock as an officially supported feature?

Best regards, Andreas

papandreou commented 5 years ago

Ping @AVVS?