swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.51k stars 10.35k forks source link

[SR-14902] Binaries with Concurrency/Dispatch crash with _dispatch_queue_no_activate. #57249

Open swift-ci opened 3 years ago

swift-ci commented 3 years ago
Previous ID SR-14902
Radar rdar://problem/80383002
Original Reporter 3405691582 (JIRA User)
Type Bug
Environment Linux (clean build and checkout at HEAD) OpenBSD (with pr apple/swift-corelibs-libdispatch#559)
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 0 | |Component/s | | |Labels | Bug, Concurrency | |Assignee | None | |Priority | Medium | md5: 989fbed04827f8361dfc98879fa3bf00

Issue Description:

Presumably all Dispatch-based Concurrency binaries built at HEAD currently crash at _dispatch_queue_no_activate.

Easy reproduction on Linux is to execute one of the Concurrency tests, e.g.,
./llvm-project/llvm/utils/lit/lit.py -sv --param swift_site_config=./build/Ninja-DebugAssert/swift-linux-x86_64/test-linux-x86_64/lit.site.cfg swift/test/Concurrency/Runtime/async_let_throws.swift. Test will build ./build/Ninja-DebugAssert/swift-linux-x86_64/test-linux-x86_64/Concurrency/Runtime/Output/async_let_throws.swift.tmp/a.out and crash with SIGILL.

I am not completely confident why exactly this is the case but I have some suspicions as to what might be going awry based on some debugging. See e.g. (lightly redacted) GDB session

(gdb) f
#​0  _dispatch_queue_no_activate (dqu=...,
    allow_resume=0x2a529413ff0 <__OS_dispatch_queue_main_vtable>)
    at .../swift/swift-corelibs-libdispatch/src/init.c:652
652             DISPATCH_INTERNAL_CRASH(dx_type(dqu._dq), "dq_activate called");
(gdb) print *(*(struct dispatch_queue_global_s *)dqu._dq).do_vtable
$62 = {_os_obj_xref_dispose = 0x502, _os_obj_dispose = 0x0, _os_obj_vtable = {
    do_type = 1,
    do_kind = 0x2a5761bddb0 <jobInvoke(void*, void*, unsigned int)> "UH\211\345H\203\354@H\211}\370H\211u\360\211U\354H\213E\370H\211E\340H\213E\340H\211E\310H\213}\340\350$\027", do_dispose = 0x0, do_debug = 0x0, do_invoke = 0x0,
    dq_activate = 0x0, dq_wakeup = 0x0, dq_push = 0x0}}

Observe how do_kind – a const char * is being interpreted as a function pointer here. Swift tries to manufacture Dispatch-compatible objects when it uses Dispatch to implement Concurrency features (see Task.cpp#L257) . This mismatch seems to suggest there is some variation in the way the object is being laid out on the Swift side versus how it is interpreted on the Dispatch side.

However, I suspect that this is not taking to consideration that part of the object header in Dispatch is defined differently when USE_OBJC is 1 or 0 (see src/objc_internal.h#L174 – what also may be relevant is that there is a similar deviation when OS_OBJECT_HAVE_OBJC1 is 1 or 0 (see src/object_internal.h#L436).

(The main reason why I am not 100% sure that this is the root cause of the crash is that I am missing is how _dispatch_queue_no_activate is reached if the suspect vtable has 0x0 for dq_activate – but the above analysis may nonetheless be relevant to a potential fix. I have not experimented in trying to reorder the metadata in Task.cpp to try and bring the layouts on Swift/Dispatch into alignment. I suspect perhaps this was not tripped earlier because the code as it stands at HEAD will align correctly with the ObjC-enabled branch)

drexin commented 3 years ago

@swift-ci create

swift-ci commented 3 years ago

Comment by 3405691582 (JIRA)

I think #38386 fixes this – I am not seeing this problem recur at HEAD now on OpenBSD.

mikeash commented 3 years ago

Yeah, that should fix it, specifically this commit: https://github.com/apple/swift/pull/38386/commits/930c72aee2687224f606dbba95c7f2b30c7340f7