Hang when invoking async code synchronously from a sync context

wadetregaskis commented 2 months ago

Description

Sometimes I need to run async code synchronously, such as when an Apple framework API calls my closure (no async support) and requires a result from it, but the code required to retrieve that result is async. In an ideal world this wouldn't occur, but in typical macOS and iOS GUI programming it occurs quite often.

And conceptually this is trivial; indeed in prior paradigms like NSRunloop this sort of pattern was easy to implement, since runloops were re-entrant. There's no such API available to me for Swift Concurrency, as far as I'm aware.

I've almost got it working with Swift Concurrency, without runtime issues and compiler complaints, but I'm stuck on the last hurdle and I'm not sure it's not a compiler / stdlib bug.

Reproduction

import Dispatch

extension Task {
    static func sync(_ code: () async throws(Failure) -> Success) throws(Failure) -> Success {
        let semaphore = DispatchSemaphore(value: 0)

        nonisolated(unsafe) var result: Result<Success, Failure>? = nil

        withoutActuallyEscaping(code) {
            let sendableCode = unsafeBitCast($0, to: (@Sendable () async throws -> Success).self)

            _ = Task<Void, Never>.detached(priority: .userInitiated) { @Sendable () async -> Void in
                do {
                    result = .success(try await sendableCode())
                } catch {
                    result = .failure(error as! Failure)
                }

                semaphore.signal()
            }

            semaphore.wait()
        }

        return try result!.get()
    }
}

Expected behavior

Everything runs fine, life moves on.

The actual behaviour is that it just silently hangs at runtime, even though the code compiles without any complaints in Swift 6 mode. The code closure is never actually invoked; the Task's closure seems to just silently hang when trying to call it. There is no evidence of it anywhere in any thread's callstack (the thread running sync is stuck on the semaphore.wait()).

Yet if I mark the code parameter as @Sendable, it works fine. However, then I'm erroneously requiring code to be sendable and it causes design and implementation problems for the caller (mainly re. capture local state, which isn't and shouldn't have to be sendable because there's no concurrency or escaping involved).

My only hypothesis so far is that it's trying to force code to run on the same thread as sync is called on, thus a fairly pedestrian deadlock, but: I don't know why (the code closure isn't intentionally bound to any particular isolation context) and I don't understand how merely making the closure @Sendable somehow changes that. Thus why I'm wondering if this is actually expected behaviour in any sense, or rather a bug.

I tried in vain every possible attribute and modifier I could think of (lots of nonisolated stuff, @_inheritActorContext, etc), even stuff I didn't expect to work merely for completeness, and nothing has helped.

Environment

Xcode 15 beta 5. swift-driver version: 1.113 Apple Swift version 6.0 (swiftlang-6.0.0.7.6 clang-1600.0.24.1)

Additional information

No response

mattmassicotte commented 2 months ago

I believe your analysis is correct. The signature of this function is promising that code, because it is non-Sendable, cannot change isolation domains. And if this function is called from a non-isolated context, I think it will always be called synchronously. But the unsafeBitCast is why this unsupported arragement isn't being caught at compile time.

I'm pretty sure your only options are to mark the code closure Sendable or sending.

wadetregaskis commented 2 months ago

But I argue that semantically the closure isn't sent. It's run synchronously to the caller's thread, with appropriate memory synchronisations via the semaphore. That it happens to be execute on a different thread is mostly an irrelevant implementation detail.

Only mostly because it's true that any thread-specific state it might hypothetically rely on would be different, but it's very rare to rely on such things and I certainly don't in any of the code I need to use this for.

mattmassicotte commented 2 months ago

My only hypothesis so far is that it's trying to force code to run on the same thread as sync is called on

What I'm saying is I'm pretty sure this is exactly what is happening. I'm not even sure you can rely on the Task.detached here either. When already non-isolated, I don't know what its behavior is other than clearing out TaskLocal values.

wadetregaskis commented 2 months ago

What befuddles me is that there's no indication in the code that the closure in question (nor anything it invokes transitively via await) is isolated to any particular context. It is by design all code that can be called in any context. None is tied to an actor (global or otherwise). None uses any thread- or task-local state.

Worse, there doesn't seem to be any syntax the compiler will accept to ensure this - keywords like nonisolated aren't accepted in function parameter signatures, for example.

Separately, I should note that the code as posted is buggy even if the closure parameter is @Sending, because there's a race between signalling the semaphore and the task actually ending; when that race is lost, the closure is still retained and withoutActuallyEscaping crashes. 😔

It seems like there's a missing unsafeWithoutActuallyEscaping, and/or a way to actually signal when the task is fully cleaned up (I tried everything I could think of involving actors, TaskGroup, recursive Task.detached, etc, and nothing works).

wadetregaskis commented 2 months ago

Tangentially, I tried using sending throughout and return values (from tasks & closures) to convey the results, rather than capturing local mutable variables - in order to make it moot that the closure parameter has to be @Sendable - but:

I couldn't actually get it to compile. I'm not saying it's impossible, just that I spent over an hour refactoring what amounts to only a few dozen lines of code, and it just seemed to be an endless sequence of compiler complaints (none of which were indicative of actual flaws in the code, just me trying to convince the compiler - it felt a lot like fighting with the borrow checker in Rust, only without any actual correctness benefits).
It really complicated some of the code (e.g. I can no longer use defer or other simple mechanisms for clean-up, since I have to send the objects in question back & forth and the Swift compiler isn't smart enough to understand what's going on - which is fair, since I can barely wrap my head around the code at that point, it becomes so complicated 😕).

I realise that all this is pretty abstract, and I'm not sure if & where the bug(s) are in the compiler here (beyond the many places in which it is overly conservative about concurrency-safety). But if nothing else, there is a clear pragmatic gap in the Swift language regarding bridging sync to async code.

swiftlang / swift