swarthy / redis-semaphore

Distributed mutex and semaphore based on Redis
MIT License
148 stars 28 forks source link

fix: Race condition between releasing and refreshing locks #206

Closed elliott-with-the-longest-name-on-github closed 6 months ago

elliott-with-the-longest-name-on-github commented 6 months ago

Closes #205

Occasionally, when the timing works out just right, there's a race condition between refreshing a lock and releasing it. This is the code that causes it:

  async release() {
    debug(
      `release ${this._kind} (key: ${this._key}, identifier: ${this._identifier})`
    )
    if (this._refreshTimeInterval > 0) {
      this.stopRefresh() <---- the refresh interval schedules `_refresh` RIGHT before it's cancelled
    }
    if (this._acquired || this._acquiredExternally) {
      await this._release() <---- the execution of `_release` occurs at the same time, "beating" `_refresh`
      // BOOM, the lock is released, but refresh has still been scheduled (or was already running and was just slower)
      // it fails to refresh because we no longer have the lock
    }
    this._acquired = false
    this._acquiredExternally = false
  }

The fix is pretty simple -- just don't throw if _refresh fails and the lock is no longer acquired.

A more thorough analysis:

There are basically two cases we can run into here. Either _refresh wins, in which case there's no problem, and _release can still release. No changes necessary. The other option is that _release wins, in which case _refresh will throw because it failed to release the lock. As long as _refresh checks to see whether we still have the lock before throwing, we can avoid this after-release-refresh-failure.

Reproduction:

import { Semaphore } from "redis-semaphore";
import Redis from "ioredis";

const client = new Redis();
function getSemaphore() {
  return new Semaphore(client, "asdf", 25, {
    acquireTimeout: 30 * 60 * 1000,
    lockTimeout: 10_000,
    refreshInterval: 10,
  });
}
const transferItem = async (i) => {
  const sema = getSemaphore();
  try {
    await sema.acquire();
    console.log(i);
    await new Promise((resolve) => setTimeout(resolve, 1_000));
    return;
  } finally {
    await sema.release();
  }
};

await Promise.all(Array.from({ length: 500 }).map((_, i) => transferItem(i)));

The refresh interval is super short because it makes the error more common, but it can occur at any interval.

swarthy commented 6 months ago

Thanks for fix! I've reverted debug message (useful for tests) and added synthetic test for this case.