llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.65k stars 11.84k forks source link

[lld/mac/arm64] Two of Chromium's base_unittests fail when linked with lld #49873

Closed nico closed 3 years ago

nico commented 3 years ago
Bugzilla Link 50529
Resolution FIXED
Resolved on Jun 22, 2021 03:13
Version unspecified
OS All
CC @gkmhub,@int3,@smeenai
Fixed by commit(s) https://reviews.llvm.org/rGd6565a2dbcbe0932cd5cbb95bf2fc06855dfe938

Extended Description

repro: https://drive.google.com/file/d/1YvyZxh_ItvL1hdVeaab-kU7zm-d1AokX/view?usp=sharing (17M bzipped, 79M extracted -- not huge)

With ld64:

% ld @​response.txt % ./base_unittests '--gtest_filter=MessagePumpMacTest.*' IMPORTANT DEBUGGING NOTE: batches of tests are run inside their own process. For debugging a test inside a debugger, use the --gtest_filter= flag along with --single-process-tests. Using sharding settings from environment. This is shard 0/1 Using 1 parallel jobs. [12105:259:ERROR:icu_util.cc(179)] icudtl.dat not found in bundle [12105:259:ERROR:icu_util.cc(243)] Invalid file descriptor to ICU data received. Note: Google Test filter = MessagePumpMacTest.ScopedPumpMessagesInPrivateModes:MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog:MessagePumpMacTest.QuitWithModalWindow [==========] Running 3 tests from 1 test suite. [----------] Global test environment set-up. [----------] 3 tests from MessagePumpMacTest [ RUN ] MessagePumpMacTest.ScopedPumpMessagesInPrivateModes [ OK ] MessagePumpMacTest.ScopedPumpMessagesInPrivateModes (42 ms) [ RUN ] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog [ OK ] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog (41 ms) [ RUN ] MessagePumpMacTest.QuitWithModalWindow [ OK ] MessagePumpMacTest.QuitWithModalWindow (30 ms) [----------] 3 tests from MessagePumpMacTest (114 ms total)

[----------] Global test environment tear-down [==========] 3 tests from 1 test suite ran. (114 ms total) [ PASSED ] 3 tests. [1/3] MessagePumpMacTest.ScopedPumpMessagesInPrivateModes (42 ms) [2/3] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog (41 ms) [3/3] MessagePumpMacTest.QuitWithModalWindow (30 ms) SUCCESS: all tests passed.

But with lld:

% ./base_unittests '--gtest_filter=MessagePumpMacTest.*' IMPORTANT DEBUGGING NOTE: batches of tests are run inside their own process. For debugging a test inside a debugger, use the --gtest_filter= flag along with --single-process-tests. Using sharding settings from environment. This is shard 0/1 Using 1 parallel jobs. [12111:259:ERROR:icu_util.cc(179)] icudtl.dat not found in bundle [12111:259:ERROR:icu_util.cc(243)] Invalid file descriptor to ICU data received. Note: Google Test filter = MessagePumpMacTest.ScopedPumpMessagesInPrivateModes:MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog:MessagePumpMacTest.QuitWithModalWindow [==========] Running 3 tests from 1 test suite. [----------] Global test environment set-up. [----------] 3 tests from MessagePumpMacTest [ RUN ] MessagePumpMacTest.ScopedPumpMessagesInPrivateModes [ OK ] MessagePumpMacTest.ScopedPumpMessagesInPrivateModes (36 ms) [ RUN ] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog [12110:259:ERROR:test_launcher.cc(1093)] no test result for MessagePumpMacTest.QuitWithModalWindow [1/3] MessagePumpMacTest.ScopedPumpMessagesInPrivateModes (36 ms) [2/3] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog (CRASHED) [3/3] MessagePumpMacTest.QuitWithModalWindow (SKIPPED) 1 test crashed: MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog (../../base/message_loop/message_pump_mac_unittest.mm:99) 1 test not run: MessagePumpMacTest.QuitWithModalWindow (../../base/message_loop/message_pump_mac_unittest.mm:121) Tests took 0 seconds.

nico commented 3 years ago

Uploaded that as a stop gap to here: https://reviews.llvm.org/D104681

nico commented 3 years ago

patch sketch here's an untested thing to add encodings for functions without encoding. not tested with anything yet.

nico commented 3 years ago

We have another crash bug in libunwind, this one in x86_64: https://bugs.chromium.org/p/chromium/issues/detail?id=1220175

Chances are it's due to this bug too, but I haven't verified that yet.

nico commented 3 years ago

gkm said he's interested in working on the actual fix, so assigning to him. (Feel free to give back to me if that should change :) )

nico commented 3 years ago

Yeah I was thinking that we would scan the compactUnwindSection once to gather all function pointers, then scan each InputFile for symbols in __text that don't have a corresponding entry. Not exactly cheap but I guess it's unavoidable

It might be cheaper to give each Defined an associated unwind info range (when parsing the .o file this is cheap-ish: just walk the sorted unwind info in parallel) and then write the whole __unwind_info section based on that like ld64 does. That would also implement only writing unwind stuff for non-coalesced symbols as a side effect.

Not sure if that's better or worse.

The __eh_frame / __unwind_info stuff doesn't fit terribly neatly into the normal linker design :/

int3 commented 3 years ago

We also need to be able to identify all functions without unwind info somehow, right?

Yeah I was thinking that we would scan the compactUnwindSection once to gather all function pointers, then scan each InputFile for symbols in __text that don't have a corresponding entry. Not exactly cheap but I guess it's unavoidable

nico commented 3 years ago

Looks like UnwindCursor<A, R>::getInfoFromCompactEncodingSection() copies the encoding to _info.format and that being 0 causes _unwindInfoMissing to become true. So an encoding of 0 is a special case.

(However, it looks like _unwindInfoMissing true causes step() to return UNW_STEP_END which is a bit strange too?)

lldb also special-cases encoding of 0 here: https://github.com/llvm/llvm-project/blob/main/lldb/source/Symbol/CompactUnwindInfo.cpp#L180

But lldb manages to unwind fine past call() somehow…

nico commented 3 years ago

If you build the cc file with -fomit-frame-pointer, you can get that file to not include any unwind info. Locally I've also hacked up things to not emit eh_frame on x86_64 either [1], not sure that makes a difference. With that, bt in lldb doesn't show call, but that's with both lld and ld64. I guess things are better on x86_64 because everything always unwinds off rbp if it unwinds, at least for clang-generated object files (?)

It's a bit surprising that ld64 writes an encoding of 0x00000000; as far a I can tell that's not a valid encoding. I would expect that libunwind/src/CompactUnwinder.hpp,

CompactUnwinder_arm64::stepWithCompactEncoding

would hit the

_LIBUNWIND_ABORT("invalid compact unwind encoding");

but that's not what's happening. So I think I don't fully understand things yet.

[1]:

% git diff diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp index 99526b384c6e..b4a53f9225f1 100644 --- a/llvm/lib/MC/MCObjectFileInfo.cpp +++ b/llvm/lib/MC/MCObjectFileInfo.cpp @@ -57,10 +57,11 @@ void MCObjectFileInfo::initMachOMCObjectFileInfo(const Triple &T) { MachO::S_ATTR_STRIP_STATIC_SYMS | MachO::S_ATTR_LIVE_SUPPORT, SectionKind::getReadOnly());

  • if (T.isOSDarwin()) {
  • if (T.isOSDarwin() && (T.getArch() != Triple::x86)) SupportsCompactUnwindWithoutEHFrame = true;
  • if (T.isWatchABI() || T.isOSDarwin() && T.getArch() != Triple::x86) OmitDwarfIfHaveCompactUnwind = true;
  • }

    FDECFIEncoding = dwarf::DW_EH_PE_pcrel;

we could deal with this by having cuPtrVector include dummy entries for functions without unwind info, right before the sort+fold step here:

We also need to be able to identify all functions without unwind info somehow, right?

int3 commented 3 years ago

Nice find! I guess we could deal with this by having cuPtrVector include dummy entries for functions without unwind info, right before the sort+fold step here: https://github.com/llvm/llvm-project/blob/main/lld/MachO/UnwindInfoSection.cpp#L297

FWIW, compiling your small repro example for x86_64 generates nonzero (and identical!) unwind info for all 4 functions, including __Z4callPFvvE.

nico commented 3 years ago

Here's a tiny stand-alone repro based on this theory. It successfully reproduces the issue :)

The idea is to put a "no frame, no saved registers" entry in file0.o, to not have any entries in file1.o (due to using -fno-exceptions, which omits the entries, which is supposed to give everything the default entry), and then put something from file1.o on the stack when calling the unwinder (which I called "call").

The unwinder will look for the entry for "call", but since lld doesn't write one, will find the entry at the highest lower address, which is the entry from file0.o which is "no frame, no registers" -- but "call" has a default frame. ...at least that's what I thought, but now that I compare unwinddump output for the binary linked by lld and ld64, the difference is that lld writes

encodingsCount=0x00000000 [ 0] encoding[2]=0x02000000 (no frame, no saved registers) Z5dummyv [ 1] encoding[0]=0x04000000 (std frame:) Z3foov [ 2] encoding[1]=0x02001000 (stack size=16:) __ZL7handlerP11objc_objectPv [ 3] encoding[0]=0x04000000 (std frame:) _main

while ld64 writes

encodingsCount=0x00000003 [ 0] encoding[3]=0x02000000 (no frame, no saved registers) Z5dummyv [ 1] encoding[2]=0x00000000 (no frame, no saved registers) Z4callPFvvE [ 2] encoding[0]=0x04000000 (std frame:) Z3foov [ 3] encoding[1]=0x02001000 (stack size=16:) ZL7handlerP11objc_objectPv [ 4] encoding[0]=0x04000000 (std frame:) _main

So ld64 does have an explicit entry for __Z4callPFvvE but it's also "no frame, no saved registers". The encoding is 0x00000000 instead of 0x02000000 which I suppose is the difference that matters (?)

Repro:

thakis@M1BP gn % cat file0.mm
void dummy() {}

thakis@M1BP gn % cat file1.cc void call(void(*f)()) { f(); }

thakis@M1BP gn % cat file2.mm

include <objc/objc-exception.h>

void call(void(*)());

static void handler(id a, void* b) {}

void foo() { // Anything that calls unw_getcontext/unw_init_local/unw_step should do, // objc_addExceptionHandler() does that, so calling that for convenience. objc_addExceptionHandler(&handler, 0); }

int main() { call(foo); }

thakis@M1BP gn % clang -c file0.mm file2.mm thakis@M1BP gn % clang -c -fno-exceptions file1.cc

thakis@M1BP gn % ../../third_party/llvm-build/Release+Asserts/bin/clang -fuse-ld=lld file0.o file1.o file2.o -isysroot $(xcrun -show-sdk-path) -lobjc thakis@M1BP gn % ./a.out zsh: trace trap ./a.out

thakis@M1BP gn % ../../third_party/llvm-build/Release+Asserts/bin/clang file0.o file1.o file2.o -isysroot $(xcrun -show-sdk-path) -lobjc
thakis@M1BP gn % ./a.out # Works with ld64

nico commented 3 years ago

__ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv is in obj/base/base_unittests/jumbo_ii.o which doesn't have any unwind info, likely due to it being a cc file built with -fno-exceptions.

nico commented 3 years ago

Hm, if I set a breakpoint in foo and run bt at that breakpoint in lldb, here's what I get with the crashing binary:

(lldb) bt

So lldb also has trouble unwinding.

With the working binary (by tweaking the rsp file):

(lldb) bt

"non-virtual thunk to base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork()" is __ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv in mangled.

"base::MessagePumpCFRunLoopBase::RunWork()" is __ZN4base24MessagePumpCFRunLoopBase7RunWorkEv in mangled.

Crashing binary:

% nm -n base_unittests | grep -E 'ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv|ZN4base24MessagePumpCFRunLoopBase7RunWorkEv'

000000010000e3a8 t ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv 00000001000182dc t ZN4base24MessagePumpCFRunLoopBase7RunWorkEv

Working binary:

thakis@M1BP gn % nm -n base_unittests | grep -E 'ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv|ZN4base24MessagePumpCFRunLoopBase7RunWorkEv'

000000010000e0ac t ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv 00000001000182dc t ZN4base24MessagePumpCFRunLoopBase7RunWorkEv

Reminder form an earlier comment:

In the crashing case, the last entry before the hole is

funcOffset=0x00000E7C (no frame, no saved registers) __ZN4base8internal7InvokerINS0_9BindStateIPFvRKZ4mainE3$_0EJS3_EEEFvvEE3RunEPNS0_13BindStateBaseE

And the first after it is

funcOffset=0x00017960 (std frame: x19/20 x21/22 x23/24) __ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE

In the non-crashing cash, the last entry before the hole is

funcOffset=0x00000B90 (std frame:) __ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE

And the first after it is

funcOffset=0x00017670 (std frame: x19/20) _main

So __ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv is also in that hole. So I guess that's the function that has the wrong unwind info, but it usually picks up the right unwind info from a different symbol, even with ld64, since it's never appears in unwinddump output, even when things work.

nico commented 3 years ago

Maybe clang/llvm do that optimization to omit compact unwind info for one function if it'd be identical to the previous function too? And then if the previous function is elided (either due to it being a weak coealesced symbol or a dead symbol -- in this case the former I suppose since it repros without -dead_strip too) then it'd be important that the linker explicitly inserts the unwind kind from the previous symbol.

nico commented 3 years ago

Here's something resembling a slightly more concrete theory. Part of the unwinddump output when the binary crashes:

encodingsCount=0x00000000 funcOffset=0x00000B90 (std frame:) ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE funcOffset=0x00000BB0 (std frame: x19/20) _main funcOffset=0x00000DA0 (std frame: x19/20) ZL3foov funcOffset=0x00000E44 (std frame: x19/20) ZN4base8internal16BindLambdaHelperIZ4mainE3$0FvvEE3RunERKS2 funcOffset=0x00000E7C (no frame, no saved registers) ZN4base8internal7InvokerINS0_9BindStateIPFvRKZ4mainE3$_0EJS3_EEEFvvEE3RunEPNS0_13BindStateBaseE funcOffset=0x00017960 (std frame: x19/20 x21/22 x23/24) __ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE

Here's the same with lines 2/3 switched as mentioned in the previous comment, where there's no crash:

encodingsCount=0x00000000 funcOffset=0x00000B90 (std frame:) ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE funcOffset=0x00017670 (std frame: x19/20) _main funcOffset=0x00017860 (std frame: x19/20) ZL3foov funcOffset=0x00017904 (std frame: x19/20) ZN4base8internal16BindLambdaHelperIZ4mainE3$0FvvEE3RunERKS2 funcOffset=0x0001793C (no frame, no saved registers) ZN4base8internal7InvokerINS0_9BindStateIPFvRKZ4mainE3$_0EJS3_EEEFvvEE3RunEPNS0_13BindStateBaseE funcOffset=0x00017960 (std frame: x19/20 x21/22 x23/24) __ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE

This is identical except for the funcOffsets.

In the first (crashing) output, there's a big hole between 0x00000E7C and 0x00017960 between second-to-last and last line.

In the second (working) output, the hole is between 0x00000B90 and 0x00017670 between first and second line.

Looking at nm -n base_unittests output, here's where the "missing" symbols from comment 11 are in both cases.

crashing:

thakis@M1BP gn % nm -n base_unittests | grep -E 'ZN4base8internal9BindStateIPFvRKZ4mainE3\$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE|__ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE|ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE|ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE|ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev|ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev|ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm|ZNSt3110unique_ptrIN4base8internal34ScopedLazyTaskRunnerListForTestingENS_14default_deleteIS3_EEED1Ev|ZN4base4test15TaskEnvironmentD1Ev|ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev|ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv|ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv|ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev|ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv|__ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev|___clang_call_terminate'

0000000100000e8c t ZN4base8internal9BindStateIPFvRKZ4mainE3$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE 0000000100000e98 t __ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE 0000000100000ea0 t ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE 0000000100017b24 t ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE 0000000100018170 t ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev 00000001000186b0 t ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm 0000000100018dd8 t ZN4base4test15TaskEnvironmentD1Ev 0000000100018df0 t ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev 0000000100018e50 t ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv 0000000100018edc t ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv 0000000100018ef4 t ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev 0000000100018f40 t ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv 0000000100018f4c t __ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev 0000000100019668 t ___clang_call_terminate

working:

% nm -n base_unittests | grep -E 'ZN4base8internal9BindStateIPFvRKZ4mainE3\$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE|__ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE|ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE|ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE|ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev|ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev|ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm|ZNSt3110unique_ptrIN4base8internal34ScopedLazyTaskRunnerListForTestingENS_14default_deleteIS3_EEED1Ev|ZN4base4test15TaskEnvironmentD1Ev|ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev|ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv|ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv|ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev|ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv|__ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev|___clang_call_terminate'

0000000100011d64 t ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE 000000010001794c t __ZN4base8internal9BindStateIPFvRKZ4mainE3$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE 0000000100017958 t ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE 0000000100017b24 t ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE 0000000100018170 t ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev 00000001000186b0 t ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm 0000000100018dd8 t ZN4base4test15TaskEnvironmentD1Ev 0000000100018df0 t ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev 0000000100018e50 t ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv 0000000100018edc t ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv 0000000100018ef4 t ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev 0000000100018f40 t ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv 0000000100018f4c t __ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev 0000000100019668 t ___clang_call_terminate

In the first case, the first 3 entries are in the hole (assuming 100000000 is the load address). In the second case, just the first entry is.

When libunwind looks for unwind information for a function, as far as I know it picks up the unwind info from the nearest entry at a lower address. So if something is in that hole, it gets the first entry above the hole.

In the crashing case, the last entry before the hole is

funcOffset=0x00000E7C (no frame, no saved registers) __ZN4base8internal7InvokerINS0_9BindStateIPFvRKZ4mainE3$_0EJS3_EEEFvvEE3RunEPNS0_13BindStateBaseE

In the non-crashing cash, the last entry before the hole is

funcOffset=0x00000B90 (std frame:) __ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE

And I guess "std frame" is what the missing entry needs to work.

This theory explains why this is dependent on .o order, and it suggests it is due to missing unwind entries.

(Why is this arm-only? Don't yet know for sure, but some ideas:

nico commented 3 years ago

Another idea I had was to diff xcrun unwinddump output for when I swap the order of the first few files. If I swap the 2nd and 3rd file (original order of these 2:

obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o

) then unwinddump output is basically identical except for a few offsets but the test passes. So I guess the missing entries aren't really a problem. Hmm.

I also tried building libunwind locally to add some debugging stuff there and then running the test with DYLD_LIBRARY_PATH=$HOME/src/llvm-project/out/gn/lib DYLD_PRINT_LIBRARIES=1 set, but my locally built libunwind complains libunwind: malformed __unwind_info at 0x1944E328C bad second level page (for both ld64 and lld used to built base_unittests). Not sure why. Maybe I'll debug that part a bit.

int3 commented 3 years ago

Interesting finds...

I'm still puzzled over the missing LSDA for __ZSt9terminatev though. At first I'd wondering if perhaps there were multiple definitions for that function in different object files, with only some having the associated LSDA, but it seems like the only place that it's defined is in cxa_handlers.o. And we do have an entry for that function in the second-level pages, so we are finding the one CU entry that points to it.

nico commented 3 years ago

getAllUnwindInfos() in ld64:

if ( atom->beginUnwind() == atom->endUnwind() ) { // be sure to mark that we have no unwind info for stuff in the TEXT segment without unwind info if ( (atom->section().type() == ld::Section::typeCode) && (atom->size() !=0) ) { entries.push_back(UnwindEntry(atom, address, 0, NULL, NULL, NULL, 0)); } }

I think ld64's algorithm is:

lld currently just merges __compact_unwind from all inputs, which means things without explicit unwind entries don't get unwind info.

Not sure if it's intentional on clang's end that it omits some unwind information but apparently it's important to use ld64's algorithm.

nico commented 3 years ago

Most (but not quite all) of the "missing" functions aren't in any of the input files:

thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep 'ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE'; done obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o start: 0x000002E8 ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep 'ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'; done
obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep 'ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE'; done obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep 'ZNSt3__110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev'; done obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep '
ZN4base24MessagePumpCFRunLoopBaseD1Ev'; done obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep 'ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm'; done obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep 'ZNSt3110unique_ptrIN4base8internal34ScopedLazyTaskRunnerListForTestingENS_14default_deleteIS3_EEED1Ev'; done obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep 'ZN4base4test15TaskEnvironmentD1Ev'; done obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep 'ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev'; done obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep '__ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv'; done obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep 'ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv'; done obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep 'ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev'; done obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep 'ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv'; done obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o start: 0x00001F48 ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep 'ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev'; done
obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun unwinddump $f | grep '___clang_call_terminate'; done
obj/base/base_unittests/call_with_eh_frame_asm.o obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o obj/base/base_unittests/message_pump_mac_ii.o

nico commented 3 years ago

If I read this right, ld creates compact unwind info entries for symbols that don't have them in the input .o files:

No compact unwind for this

thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo xcrun unwinddump $f | grep '__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'; done

No eh_frame either

thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo dwarfdump $f | grep '__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'; done

Still, ld writes something to the output

thakis@M1BP gn % xcrun unwinddump base_unittests | grep '__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'

relink with ld...

thakis@M1BP gn % ../../third_party/llvm-build/Release+Asserts/bin/clang++ -stdlib=libc++ -arch arm64 -nostdlib++ -isysroot sdk/xcode_links/MacOSX11.3.sdk -mmacosx-version-min=10.11.0 -Wl,-ObjC -o "./base_unittests" -Wl,-filelist,"./base_unittests.rsp" -framework AppKit -Wl,-dead_strip

...and, an entry:

thakis@M1BP gn % xcrun unwinddump base_unittests | grep '__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'

        [  7] funcOffset=0x00001DC0, encoding[ 12]=0x00000000 (no frame, no saved registers                            ) __ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE

(Also, changing the order of the input .o files makes the test pass with lld too, for reasons I also don't understand yet.)

int3 commented 3 years ago

The mismatch on common encodings might be benign, since there are different ways to encode the same thing... the other two seem suspicious though

nico commented 3 years ago

Comparing the output of xcrun unwinddump base_unittests with ld and lld, after removing hex offsets and [ 2] from the out shows:

(no frame, no saved registers) ZN4base8internal9BindStateIPFvRKZ4mainE3$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE (no frame, no saved registers) __ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE (no frame, no saved registers) ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE (std frame: x19/20 x21/22 ) ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE (std frame: x19/20 x21/22 ) ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev (std frame: x19/20 x21/22 ) ZN4base24MessagePumpCFRunLoopBaseD1Ev (no frame, no saved registers) ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm (std frame: x19/20 x21/22 x23/24 x25/26) ZNSt3110unique_ptrIN4base8internal34ScopedLazyTaskRunnerListForTestingENS_14default_deleteIS3_EEED1Ev (std frame: x19/20 ) ZN4base4test15TaskEnvironmentD1Ev (std frame: x19/20 x21/22 ) ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev (no frame, no saved registers) ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv (std frame: ) ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv (no frame, no saved registers) __ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev (no frame, no saved registers) ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv (std frame: x19/20 x21/22 ) __ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev (std frame: x19/20 x21/22 ) ___clang_call_terminate

nico commented 3 years ago

repro a tiny bit smaller repro, now small enough to attach here (931k bzipped)

nico commented 3 years ago

Here's a smaller repro: https://drive.google.com/file/d/1M1_2OaL6l4uTw7wSOb1h2XFRjGN6lVm7/view?usp=sharing (5.8M unzipped, 1.1M zipped)

nico commented 3 years ago

(The assembly kind of looks like resign mitigation #​2 in the slides.)

nico commented 3 years ago

I don't know if that's the culprit here, but it looks like there are arm64e-specific relocations. This https://llvm.org/devmtg/2019-10/slides/McCall-Bougacha-arm64e.pdf is a good intro to arm64e . ld64 has a bunch of stuff under SUPPORT_ARCH_arm64e , ARM64_RELOC_AUTHENTICATED_POINTER , etc. (Chromium doesn't use arm64e, but some system libraries apparently do, and maybe that needs linker support already? There should probably be a dedicated bug for full arm64e support in lld though -- eventually that'll be a thing for userspace apps.)

nico commented 3 years ago

Here's the crashing disassembly:

0x19a985cc8 <+396>:  tbnz   w9, #&#8203;0x0, 0x19a985e98     ; <+860>
0x19a985ccc <+400>:  tbnz   w9, #&#8203;0x1, 0x19a985eb0     ; <+884>
0x19a985cd0 <+404>:  tbnz   w9, #&#8203;0x2, 0x19a985ec8     ; <+908>
0x19a985cd4 <+408>:  tbnz   w9, #&#8203;0x3, 0x19a985ee0     ; <+932>
0x19a985cd8 <+412>:  tbnz   w9, #&#8203;0x4, 0x19a985ef8     ; <+956>
0x19a985cdc <+416>:  tbnz   w9, #&#8203;0x8, 0x19a985f10     ; <+980>
0x19a985ce0 <+420>:  tbnz   w9, #&#8203;0x9, 0x19a985f28     ; <+1004>
0x19a985ce4 <+424>:  tbnz   w9, #&#8203;0xa, 0x19a985f40     ; <+1028>
0x19a985ce8 <+428>:  tbz    w9, #&#8203;0xb, 0x19a985d00     ; <+452>
0x19a985cec <+432>:  ldr    x9, [x10]
0x19a985cf0 <+436>:  str    x9, [x19, #&#8203;0x190]
0x19a985cf4 <+440>:  ldur   x9, [x10, #-0x8]
0x19a985cf8 <+444>:  str    x9, [x19, #&#8203;0x198]
0x19a985cfc <+448>:  sub    x10, x10, #&#8203;0x10           ; =0x10 
0x19a985d00 <+452>:  ldr    x9, [x19, #&#8203;0x100]
0x19a985d04 <+456>:  str    x10, [x19, #&#8203;0x108]
0x19a985d08 <+460>:  cmp    x8, #&#8203;0x1                  ; =0x1 
0x19a985d0c <+464>:  mov    x8, x9
0x19a985d10 <+468>:  pacib  x8, x10
0x19a985d14 <+472>:  csel   x16, x9, x8, ne
0x19a985d18 <+476>:  add    x8, x19, #&#8203;0x110           ; =0x110 
0x19a985d1c <+480>:  autib  x16, x10
0x19a985d20 <+484>:  mov    x17, x16
0x19a985d24 <+488>:  xpaci  x17
0x19a985d28 <+492>:  cmp    x16, x17
0x19a985d2c <+496>:  b.eq   0x19a985d34               ; <+504>

-> 0x19a985d30 <+500>: brk #​0xc471

I think the tbnz cascade corresponds to the

if (encoding & UNWIND_ARM64_FRAME_X19_X20_PAIR) { ... } if (encoding & UNWIND_ARM64_FRAME_X21_X22_PAIR) { .. ...

in CompactUnwinder_arm64::stepWithCompactEncodingFrame(). And then that looks maybe like a pointer authentication epilogue and we crash because we fail the pointer auth check (?)

nico commented 3 years ago

This looks pretty normal:

% xcrun unwinddump ./Users/thakis/src/chrome/src/out/gn/obj/base/base_unittests/message_pump_mac_unittest.o | grep -A 8 QuitWithModalWindow_Test | grep CreateTest -A4 start: 0x00001BCC __ZN7testing8internal15TestFactoryImplIN4base43MessagePumpMacTest_QuitWithModalWindow_TestEE10CreateTestEv end: 0x00001C18 (len=0x0000004C) unwind info: 0x44000001 std frame: x19/20 personality: ___gxx_personality_v0 lsda: 0x00002AD0 GCC_except_table30

In particular, no _objc_personalityv0 so this shouldn't matter for the objc unwinder code at all. (I'm surprised there's a gxx_personality_v0 at all, I thought we built with -fno-exceptions.)

nico commented 3 years ago

From https://opensource.apple.com/tarballs/objc4/ , objc4-824, runtime/objc-exception.mm:

uintptr_t objc_addExceptionHandler(objc_exception_handler fn, void *context) { // Find the closest enclosing frame with objc catch handlers struct frame_range target_frame = findHandler(); ....

and

static struct frame_range findHandler(void) { // walk stack looking for frame with objc catch handler unw_context_t uc; unw_cursor_t cursor; unw_proc_info_t info; unw_getcontext(&uc); unw_init_local(&cursor, &uc); while ( (unw_step(&cursor) > 0) && (unw_get_proc_info(&cursor, &info) == UNW_ESUCCESS) ) { // must use objc personality handler if ( info.handler != (uintptr_t)__objc_personality_v0 ) continue; // must have landing pad if ( info.lsda == 0 ) continue; // must have landing pad that catches objc exceptions struct dwarf_eh_bases bases; bases.tbase = 0; // from unwind-dw2-fde-darwin.c:examine_objects() bases.dbase = 0; // from unwind-dw2-fde-darwin.c:examine_objects() bases.func = info.start_ip; unw_word_t ip; unw_get_reg(&cursor, UNW_REG_IP, &ip); ip -= 1; struct frame_range try_range = {0, 0, 0, 0}; if ( isObjCExceptionCatcher(info.lsda, ip, &bases, &try_range) ) { unw_word_t cfa; unw_get_reg(&cursor, UNW_REG_SP, &cfa); try_range.cfa = cfa; return try_range; } }

return (struct frame_range){0, 0, 0, 0};

}

The unw_ stuff is implemented in libunwind/ in an llvm checkout. Probably somewhere in libunwind/src/CompactUnwinder.hpp, CompactUnwinder_arm64::stepWithCompactEncoding()?

nico commented 3 years ago

Other test has fundamentally the same stack:

% lldb -- ./base_unittests '--gtest_filter=MessagePumpMacTest.QuitWithModalWindow' --single-process-tests (lldb) target create "./base_unittests" Current executable set to '/Users/thakis/src/chrome/src/out/gn/repro/base_unittests' (arm64). (lldb) settings set -- target.run-args "--gtest_filter=MessagePumpMacTest.QuitWithModalWindow" "--single-process-tests" (lldb) r Process 14442 launched: '/Users/thakis/src/chrome/src/out/gn/repro/base_unittests' (arm64) Debugger detected, switching to single process mode. Pass --test-launcher-debug-launcher to debug the launcher itself. [14442:259:ERROR:icu_util.cc(179)] icudtl.dat not found in bundle [14442:259:ERROR:icu_util.cc(243)] Invalid file descriptor to ICU data received. Detected presence of a debugger, running without test timeouts. Note: Google Test filter = MessagePumpMacTest.QuitWithModalWindow [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from MessagePumpMacTest [ RUN ] MessagePumpMacTest.QuitWithModalWindow Process 14442 stopped

nico commented 3 years ago

Stack:

% lldb -- ./base_unittests '--gtest_filter=MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog' --single-process-tests (lldb) target create "./base_unittests" Current executable set to '/Users/thakis/src/chrome/src/out/gn/repro/base_unittests' (arm64). (lldb) settings set -- target.run-args "--gtest_filter=MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog" "--single-process-tests" (lldb) r Process 14421 launched: '/Users/thakis/src/chrome/src/out/gn/repro/base_unittests' (arm64) Debugger detected, switching to single process mode. Pass --test-launcher-debug-launcher to debug the launcher itself. [14421:259:ERROR:icu_util.cc(179)] icudtl.dat not found in bundle [14421:259:ERROR:icu_util.cc(243)] Invalid file descriptor to ICU data received. Detected presence of a debugger, running without test timeouts. Note: Google Test filter = MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from MessagePumpMacTest [ RUN ] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog Process 14421 stopped

nico commented 3 years ago

(This works fine on x86.)

nico commented 3 years ago

assigned to @gkmhub