llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.75k stars 11.89k forks source link

LLD for Mach-O incorrectly dead-strips merged CFStrings #104625

Closed smeenai closed 1 month ago

smeenai commented 2 months ago

101222 recently landed which enables private symbols to be merged, which includes CFString constants. On a Clang containing that patch, if you build the following C code:

$ cat merge.c
void *__CFConstantStringClassReference;
int main(int argc, char *argv[]) {
  return argc ? (long)__builtin___CFStringMakeConstantString("foo")
              : (long)__builtin___CFStringMakeConstantString("bar");
}

$ clang -target arm64-apple-macos11 -O3 -S -o merge.s merge.c

It generates the assembly in https://gist.github.com/smeenai/9fe473bcb5c4cb4a2b97eb147b23f671 (I changed a .set directive changed to = to avoid #104623 and match the object file generated by Clang directly). If you compile and link this assembly:

$ llvm-mc --triple=arm64-apple-macos11 --filetype=obj -o merge.o merge.s
$ ld64.lld -arch arm64 -platform_version macos 11 11 -dead_strip -o merge merge.o

you'll discover that the second cfstring constant has been incorrectly dead-stripped:

$ llvm-objdump -h merge.o | grep cfstring
  3 __cfstring       00000040 0000000000000020 DATA

$ llvm-objdump -h merge | grep cfstring
  3 __cfstring    00000020 0000000100004000 DATA

Note the size of 0x20 instead of 0x40 in the output binary. The code in the binary still references the second string though:

$ llvm-objdump -d merge
0000000100000470 <_main>:
100000470: 1001dc88     adr     x8, 0x100004000 <__MergedGlobals>
100000474: d503201f     nop
100000478: 11008109     add     w9, w8, #0x20
10000047c: 7100001f     cmp     w0, #0x0
100000480: 1a880120     csel    w0, w9, w8, eq
100000484: d65f03c0     ret

Which causes crashes at runtime in an actual program.

This might be a general problem with dead-stripping sections with alt_entry symbols and not something specific to CFStrings, but I haven't experimented further yet.

llvmbot commented 2 months ago

@llvm/issue-subscribers-lld-macho

Author: Shoaib Meenai (smeenai)

#101222 recently landed which enables private symbols to be merged, which includes CFString constants. On a Clang containing that patch, if you build the following C code: ``` $ cat merge.c void *__CFConstantStringClassReference; int main(int argc, char *argv[]) { return argc ? (long)__builtin___CFStringMakeConstantString("foo") : (long)__builtin___CFStringMakeConstantString("bar"); } $ clang -target arm64-apple-macos11 -O3 -S -o merge.s merge.c ``` It generates the assembly in https://gist.github.com/smeenai/9fe473bcb5c4cb4a2b97eb147b23f671 (I changed a `.set` directive changed to `=` to avoid #104623 and match the object file generated by Clang directly). If you compile and link this assembly: ``` $ llvm-mc --triple=arm64-apple-macos11 --filetype=obj -o merge.o merge.s $ ld64.lld -arch arm64 -platform_version macos 11 11 -dead_strip -o merge merge.o ``` you'll discover that the second cfstring constant has been incorrectly dead-stripped: ``` $ llvm-objdump -h merge.o | grep cfstring 3 __cfstring 00000040 0000000000000020 DATA $ llvm-objdump -h merge | grep cfstring 3 __cfstring 00000020 0000000100004000 DATA ``` Note the size of 0x20 instead of 0x40 in the output binary. The code in the binary still references the second string though: ``` $ llvm-objdump -d merge 0000000100000470 <_main>: 100000470: 1001dc88 adr x8, 0x100004000 <__MergedGlobals> 100000474: d503201f nop 100000478: 11008109 add w9, w8, #0x20 10000047c: 7100001f cmp w0, #0x0 100000480: 1a880120 csel w0, w9, w8, eq 100000484: d65f03c0 ret ``` Which causes crashes at runtime in an actual program. This might be a general problem with dead-stripping sections with alt_entry symbols and not something specific to CFStrings, but I haven't experimented further yet.
smeenai commented 2 months ago

I'm thinking maybe it's just invalid for GlobalMerge to affect CFString literals? The linker has built-in atomization for them, and GlobalMerge breaks that atomization.

kyulee-com commented 1 month ago

This issue also seems to be reproducible with the MachO system linkers in the following versions:

$ ld-classic -v
@(#)PROGRAM:ld-classic  PROJECT:ld64-951.9
$ ld  -v
@(#)PROGRAM:ld PROJECT:ld-1053.12

cc. @amy-kwan Can we make a forward fix for https://github.com/llvm/llvm-project/pull/101222 which seems breaking MachO?

kyulee-com commented 1 month ago

cc. @aemerson @dtellenbach

jyknight commented 1 month ago

I'm looking at this. Two results from my investigation so far:

  1. There's a bug in ld-prime (only) in handling alt_entry, where adding an alias to offset 0 causes alt_entry on further offsets to fail to properly inhibit GC. Still looking into this, to determine more exactly the contours of this problem.
  2. The __cfstring section is magic to all mach-o linkers, and is split into pieces by size, rather than by symbols. So it's definitely unsafe to use GlobalMerge with __cfstring section's data, ever.

I will file a bug against Apple for issue 1 in a bit.

amy-kwan commented 1 month ago

My apologies for being a bit late to this as I am just catching up this with issue. Thanks @smeenai for creating the issue and @jyknight for the PR. I will take a look at the PR, as well.

jyknight commented 4 weeks ago

Apple bug report for the ld-prime issue filed as FB15334353. Also published it here: https://gist.github.com/jyknight/e1ae4f3d5c08798a5506a6fa88550635

zeroomega commented 3 weeks ago

We actually hit a segfault issue in our mac-arm64 build in clang tool test after PR#101222 was landed (though after a very lengthly bisection). Error message:

******************** TEST 'LLVM :: tools/dsymutil/ARM/DWARFLinkerParallel/accel-imported-declarations.test' FAILED ********************
Exit Code: 139

Command Output (stderr):
--
RUN: at line 1: /Volumes/Work/s/w/ir/x/w/llvm_build/bin/dsymutil --linker parallel -accelerator=Dwarf    -oso-prepend-path=/Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/tools/dsymutil/ARM/DWARFLinkerParallel/../../Inputs    /Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/tools/dsymutil/ARM/DWARFLinkerParallel/../../Inputs/accel-imported-declaration.macho-arm64 -o /Volumes/Work/s/w/ir/x/w/llvm_build/test/tools/dsymutil/ARM/DWARFLinkerParallel/Output/accel-imported-declarations.test.tmp.dwarf.dSYM
+ /Volumes/Work/s/w/ir/x/w/llvm_build/bin/dsymutil --linker parallel -accelerator=Dwarf -oso-prepend-path=/Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/tools/dsymutil/ARM/DWARFLinkerParallel/../../Inputs /Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/tools/dsymutil/ARM/DWARFLinkerParallel/../../Inputs/accel-imported-declaration.macho-arm64 -o /Volumes/Work/s/w/ir/x/w/llvm_build/test/tools/dsymutil/ARM/DWARFLinkerParallel/Output/accel-imported-declarations.test.tmp.dwarf.dSYM
warning: /Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/tools/dsymutil/ARM/DWARFLinkerParallel/../../Inputs/accel-imported-declaration.macho-arm64.o: timestamp mismatch between object file (2024-10-04 13:05:32.202526055) and debug map (2023-02-10 08:59:50.000000000)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: /Volumes/Work/s/w/ir/x/w/llvm_build/bin/dsymutil --linker parallel -accelerator=Dwarf -oso-prepend-path=/Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/tools/dsymutil/ARM/DWARFLinkerParallel/../../Inputs /Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/tools/dsymutil/ARM/DWARFLinkerParallel/../../Inputs/accel-imported-declaration.macho-arm64 -o /Volumes/Work/s/w/ir/x/w/llvm_build/test/tools/dsymutil/ARM/DWARFLinkerParallel/Output/accel-imported-declarations.test.tmp.dwarf.dSYM
 #0 0x0000000106858510 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/llvm+0x105944510)
 #1 0x000000010685652c llvm::sys::RunSignalHandlers() (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/llvm+0x10594252c)
 #2 0x0000000106858ba8 SignalHandler(int) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/llvm+0x105944ba8)
 #3 0x000000018f2a6a24 (/usr/lib/system/libsystem_platform.dylib+0x18042ea24)
 #4 0x000000018f2f8230 (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation+0x180480230)
 #5 0x000000010100a7cc llvm::dsymutil::getBundleInfo(llvm::StringRef) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/llvm+0x1000f67cc)
 #6 0x0000000101001df4 getOutputFileName(llvm::StringRef, DsymutilOptions const&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/llvm+0x1000eddf4)
 #7 0x000000010100084c dsymutil_main(int, char**, llvm::ToolContext const&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/llvm+0x1000ec84c)
 #8 0x00000001012b9984 findTool(int, char**, char const*) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/llvm+0x1003a5984)
 #9 0x00000001012b8fd4 main (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/llvm+0x1003a4fd4)
#10 0x000000018ef1ff28 
/Volumes/Work/s/w/ir/x/w/llvm_build/test/tools/dsymutil/ARM/DWARFLinkerParallel/Output/accel-imported-declarations.test.script: line 4: 56442 Segmentation fault: 11  /Volumes/Work/s/w/ir/x/w/llvm_build/bin/dsymutil --linker parallel -accelerator=Dwarf -oso-prepend-path=/Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/tools/dsymutil/ARM/DWARFLinkerParallel/../../Inputs /Volumes/Work/s/w/ir/x/w/llvm-llvm-project/llvm/test/tools/dsymutil/ARM/DWARFLinkerParallel/../../Inputs/accel-imported-declaration.macho-arm64 -o /Volumes/Work/s/w/ir/x/w/llvm_build/test/tools/dsymutil/ARM/DWARFLinkerParallel/Output/accel-imported-declarations.test.tmp.dwarf.dSYM

--

********************

We haven't yet start examine the binary but very likely related. We also discovered that if we enable LTO, the segfault will not happen.

jyknight commented 3 weeks ago

Well, how odd! Let me know if you figure anything else out.

Would be nice to see more info, e.g. what CoreFoundation API it was crashing on, or whether the StringRef passed to getBundleInfo was somehow invalid. Perhaps running under a debugger would help.