AsyncRecursiveLock allows two distinct tasks to run the same critical section concurrently

groue commented 2 months ago

Here is a sample code that should create a deadlock, but does not:

import Lock
import Semaphore // http://github.com/groue/Semaphore

actor MyActor {
    // How many times lock2 was acquired
    var count2 = 0
    private let lock1 = AsyncRecursiveLock()
    private let lock2 = AsyncRecursiveLock()

    // Run `action ` while both lock1 and lock2 have been acquired.
    func critical12(action: () async -> Void) async {
        await lock1.withLock {
            await critical2(action: action)
        }
    }

    // Run `action ` while lock2 has been acquired.
    func critical2(action: () async -> Void) async  {
        await lock2.withLock {
            count2 += 1
            await action()
        }
    }
}

let actor = MyActor()
let semaphore1 = AsyncSemaphore(value: 0)
let semaphore2 = AsyncSemaphore(value: 0)

// Task 1
Task {
    await actor.critical12 {
        // Here lock1 and lock2 have been acquired.
        // Let Task 2 try to acquire lock2, and signal semaphore2.
        // Since lock2 is already acquired, Task 2 can not make
        // any progress, can not signal semaphore2, and we have
        // created a deadlock.
        semaphore1.signal()
        await semaphore2.wait()
    }
}

// Task 2
Task {
    // Wait for Task 1 to acquire lock2
    await semaphore1.wait()

    await actor.critical2 {
        // This can't run, because Task 1 will not release
        // lock2 until semaphore2 is signaled.
        semaphore2.signal()
    }
}

// Wait for deadlock
try await Task.sleep(for: .seconds(1))

// Make sure that critical section 2 was entered only once, from Task 1.
let count2 = await actor.count2
assert(count2 == 1) // ❌ count2 is 2

My interpretation is that after Task 1 has acquired lock1, lock2 is completely unprotected, due to the task local. This allows Task 2 to acquire lock2 while Task 1 is supposed to hold it.

In the end, there is no deadlock, lock2 is acquired twice, semaphore2 is signaled when it should not.

Semaphores are a fantastic testing tool 🙂

mattmassicotte commented 2 months ago

Yeah my implemenation is bad. The use of the TaskLocal applies to all AsyncRecursiveLock instances within the task. Totally broken.

groue commented 2 months ago

I did not intend do say what you wrote, of course 😇 Whenever you want to address this, this ready-made scenario might turn helpful!

mattmassicotte commented 2 months ago

I'm still thinking of how. First, I'm trying to come up with a test that doesn't involve deadlock in the correct case.

mattmassicotte commented 2 months ago

While I continue here, I've realized something. Do you think it is worthwhile having two distinct types? Or should AsyncLock just be recursive?

groue commented 2 months ago

🤔 My two cents, after using the semaphore in various contexts:

The name "lock" sounds very traditional. It's not something entirely new. Users come in with a pre-existing mental model. Swift concurrency can introduce new stuff, like awaiting for the lock acquisition, but the mental model is already there.

The mental model already distinguishes locks from recursive locks. Some people like to avoid recursive locks. Having two distinct types sounds ok to me, at first sight.

Traditional recursive locks allow a thread to reenter a critical section. A thread is totally serial, by definition. It's unclear to me how Task fits this, in the context of structured concurrency and task groups. For example, I'd naively assume that in a parallel task group, only one sub-task can acquire a lock at any given time. If more exploration here reveals that Swift Concurrency could break reasonable expectations about things called "recursive locks", then maybe recursive locks must be discarded - or some new name should be invented.

I'd expect those primitives to support Task cancellation.

For AsyncSemaphore, I have two variants:

await sem.wait() // ignores cancellation
try await sem.waitUnlessCancelled() // throws if task is cancelled

Maybe I should have defined a single wait() method that deals with cancellation (and throws).

Eventually, I'm not unpleased that there exist two methods, because semaphores are a great tool for tests that want to make sure things are ordered as expected. In such tests, it's sometimes handy to ignore cancellation when waiting for a signal. Eventually, I'm happy that there are two methods. I like my apis to let me do my job without fuss, and AsyncSemaphore, as it is, has the right amount of versatility.

I'm not sure locks are meant to be used in such precise scheduling scenarios, so maybe dealing with cancellation by default should be considered:

func critical() {
  // If task is already cancelled, or cancelled while waiting
  // for the lock, this throws CancellationError. The lock is
  // not acquired, and the error prevents `unlock` from
  // being called.
  try await myLock.lock()
  defer { myLock.unlock() }

  // In critical section here
}

There is no Task.sleep method that does not deal with cancellation. And SE-0304 says:

The general expectation is that asynchronous functions should attempt to respond to cancellation by promptly throwing or returning.

If I see now why semaphores can be an exception, I'm not sure locks should be another.

mattmassicotte / Lock

AsyncRecursiveLock allows two distinct tasks to run the same critical section concurrently #2