raisedapp / Hangfire.Storage.SQLite

An Alternative SQLite Storage for Hangfire
https://www.nuget.org/packages/Hangfire.Storage.SQLite
MIT License
155 stars 31 forks source link

Multiple threads racing can acquire the same distributed lock simultaneously #10

Closed Sumo-MBryant closed 4 years ago

Sumo-MBryant commented 4 years ago

Consider the following test:

[Fact, CleanDatabase]
public void OnlySingleLockCanBeAcquired()
{
    var connection = ConnectionUtils.CreateConnection();
    var numThreads = 10;
    long concurrencyCounter = 0;
    var manualResetEvent = new ManualResetEventSlim();
    var success = new bool[numThreads];

    // Spawn multiple threads to race each other.
    var threads = Enumerable.Range(0, numThreads).Select(i => new Thread(() =>
    {
        // Wait for the start signal.
        manualResetEvent.Wait();

        // Attempt to acquire the distributed lock.
        using (new SQLiteDistributedLock("resource1", TimeSpan.FromSeconds(5), connection, new SQLiteStorageOptions()))
        {
            // Find out if any other threads managed to acquire the lock.
            var oldConcurrencyCounter = Interlocked.CompareExchange(ref concurrencyCounter, 1, 0);

            // The old concurrency counter should be 0 as only one thread should be allowed to acquire the lock.
            success[i] = oldConcurrencyCounter == 0;

            Interlocked.MemoryBarrier();

            // Hold the lock for some time.
            Thread.Sleep(100);

            Interlocked.Decrement(ref concurrencyCounter);
        }
    })).ToList();

    threads.ForEach(t => t.Start());

    manualResetEvent.Set();

    threads.ForEach(t => Assert.True(t.Join(TimeSpan.FromMinutes(1)), "Thread is hanging unexpected"));

    // All the threads should report success.
    Interlocked.MemoryBarrier();
    Assert.DoesNotContain(false, success);
}

This test races threads trying to acquire a SQLiteDistributedLock and confirms whether the thread was the only one executing at between acquire and release.

The SQLiteDistributedLock::Acquire() method is unsafe. Consider the following execution through the function:

https://github.com/raisedapp/Hangfire.Storage.SQLite/blob/e67c998c781fae6fd7ff0b8fd6676df0d82ef8ac/src/main/Hangfire.Storage.SQLite/SQLiteDistributedLock.cs#L126-L142

  1. Thread 1 tries to fetch the existing lock and result is null (line 126)
  2. Thread 2 tries to fetch the existing lock and result is null (line 126)
  3. Thread 1 tries to update the new lock and rowsAffected is 0 (line 135)
  4. Thread 2 tries to update the new lock and rowsAffected is 0 (line 135)
  5. Thread 1 performs the insert (line 137)
  6. Thread 2 performs the insert (line 137)
  7. Thread 1 thinks it has the lock
  8. Thread 2 thinks it has the lock

Now there are two different locks in the DB for the same resource because there is no uniqueness constraint over the Resource column.

Suggested fixes:

Aside: I don't know the transactional capabilities nor the uniqueness constraints of LiteDB, but suspect the code this was taken from is similarly flawed.

I discovered this as the DisableConcurrentExecution was failing to prevent concurrent execution of multiple instances of the same job.

felixclase commented 4 years ago

Thanks,

I will be checking