swiftlang / swift-corelibs-libdispatch

The libdispatch Project, (a.k.a. Grand Central Dispatch), for concurrency on multicore hardware
swift.org
Apache License 2.0
2.47k stars 460 forks source link

Occasional `swiftc` crash on Windows, "disposed a muxnote with an active thread" #844

Open z2oh opened 2 months ago

z2oh commented 2 months ago

Upon upgrading our Azure CI machines to use the new Azure Cobalt ARM64 processors, we started seeing frequent compiler crashes when building a large Swift project. After some investigation, the culprit appears to be a lifecycle violation in libdispatch in the Windows pipe handling code.

The crashing line: https://github.com/apple/swift-corelibs-libdispatch/blob/e85f6a0d5c9ea1f32f5013c3fa34e4fc146cd0eb/src/event/event_windows.c#L240

And the stack trace:

[Inline Frame] dispatch.dll!_dispatch_muxnote_dispose(dispatch_muxnote_s * dmn) Line 240    C
[Inline Frame] dispatch.dll!_dispatch_muxnote_release(dispatch_muxnote_s * dmn) Line 265    C
[Inline Frame] dispatch.dll!_dispatch_event_merge_pipe_handle_read(dispatch_muxnote_s * dmn, unsigned long dwBytesAvailable) Line 669   C
dispatch.dll!_dispatch_event_loop_drain(unsigned int flags) Line 915    C
dispatch.dll!_dispatch_mgr_invoke() Line 5419   C
dispatch.dll!_dispatch_mgr_thread(dispatch_lane_s * dq, dispatch_invoke_context_s * dic, <unnamed-tag> flags) Line 5447 C
[Inline Frame] dispatch.dll!_dispatch_continuation_pop_inline(dispatch_object_t dou, dispatch_invoke_context_s * dic, <unnamed-tag> flags, dispatch_queue_class_t dqu) Line 2496    C
dispatch.dll!_dispatch_root_queue_drain(dispatch_queue_global_s * dq, unsigned int pri, <unnamed-tag> flags) Line 6114  C
dispatch.dll!_dispatch_worker_thread(void * context) Line 6250  C
dispatch.dll!_dispatch_worker_thread_thunk(void * lpParameter) Line 6272    C
[External Code] 

I suspect this is not an Cobalt/ARM64 specific issue, but is more likely a long-standing bug which has become common on this particular line of CPUs due to some scheduling or timing change.

The interesting section is here: https://github.com/apple/swift-corelibs-libdispatch/blob/e85f6a0d5c9ea1f32f5013c3fa34e4fc146cd0eb/src/event/event_windows.c#L667-L669

The event set here is used to synchronize with the pipe monitoring thread, which itself calls _dispatch_muxnote_retain.Perhaps a change in timing affected the typical order of operations here, although I haven't been able to prove this yet.

I'm trying to reproduce the crash under LIBDISPATCH_LOG to get some more information.

lxbndr commented 2 months ago

I guess there is some flaw in muxnote management which makes whole URLSession a bit unstable on non-Darwin platforms (not only Windows). For sockets, it is possible to over-release a muxnote object under heavy usage, because register and unregister code is running on different threads. Perhaps this also affects pipes. Would be nice to investigate this further though.