llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.35k stars 11.7k forks source link

[lld-macho] Symbols in `__mod_init_func` are handled hackily with `-init_offsets` #97155

Open BertalanD opened 3 months ago

BertalanD commented 3 months ago

Background: when linking macOS binaries with chained fixups, we need to transform initializers stored in __mod_init_func (an array of pointers rebased through the usual means at runtime) to __init_offsets (an array of 32-bit offsets to initializers).

Normally, we only need to care about the relocations in the input __mod_init_func sections. A problem arises when there are also symbols defined inside it. We currently ignore them in LLD -- that is, we don't add them to the symbol table (since the location they point to don't exist anymore). This doesn't happen in regular binaries created by Clang, swiftc, rustc, etc., but there have been a few instances, where this led to crashes:

Fix ideas

  1. Completely remove __mod_init_func from the list of input sections

    I thought my original patch would have this effect: we do not create an OutputSection for it and don't even include it in the global inputSections list.

    In reality, this is not enough; they are still added to the symbol table (as __mod_init_func is present in the symbols array, ObjFile::parseSymbols will reach it).

    (+) if we encounter a Defined symbol during the program's execution, we know for sure that it has an address, no need to check for a poison flag.

    (-) sounds a bit hackish; currently there is a one-on-one correspondence between ObjFile::sections and the input file's contents

  2. Create a "poisoned" state for Defined Symbols

    (+) Least amount of modification for the existing code

    (+) There will still be an entry (though poisoned) in the symbol table, so we'll be able to emit useful warnings if someone actually refers to the symbol.

    NOTE: this is basically the current workaround I ended up going for, except that I use the isLive() mechanism from dead-stripping.

  3. Some other idea?

    • maybe just add a few extra checks to the known crashy places to see if the symbols refer to a deleted section? sounds very fragile

These mentioned workarounds only work if the symbols are not actually referenced in relocations. If they are, we get different (but equally undesirable) behaviors.

; test.s
.globl _main
.text
_main:
  leaq _init_slot(%rip), %rax

.section __DATA,__mod_init_func,mod_init_funcs
_init_slot:
  .quad _main
ld64 crash ``` ❯ clang test.s -Wl,-ld_classic ld: warning: alignment (1) of atom '_init_slot' is too small and may result in unaligned pointers 0 0x10659e807 __assert_rtn + 137 1 0x1065a79e3 ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) (.cold.1) + 35 2 0x1063fc5e4 ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) + 116 3 0x1063fd087 ld::tool::OutputFile::applyFixUps(ld::Internal&, unsigned long long, ld::Atom const*, unsigned char*) + 599 4 0x106405818 ___ZN2ld4tool10OutputFile10writeAtomsERNS_8InternalEPh_block_invoke + 504 5 0x7ff815881def _dispatch_client_callout2 + 8 6 0x7ff815893547 _dispatch_apply_invoke3 + 431 7 0x7ff815881dbc _dispatch_client_callout + 8 8 0x7ff81588304e _dispatch_once_callout + 20 9 0x7ff815892740 _dispatch_apply_invoke + 184 10 0x7ff815881dbc _dispatch_client_callout + 8 11 0x7ff8158912ca _dispatch_root_queue_drain + 871 12 0x7ff81589184f _dispatch_worker_thread2 + 152 13 0x7ff815a1fb43 _pthread_wqthread + 262 A linker snapshot was created at: /tmp/a.out-2024-06-25-091629.ld-snapshot ld: Assertion failed: (_mode == modeFinalAddress), function finalAddress, file ld.hpp, line 1413. ```
ld_prime broken (?) binary ``` ❯ clang test.s -Wl,-ld_new ❯ objdump -d a.out a.out: file format mach-o 64-bit x86-64 Disassembly of section __TEXT,__text: 0000000100000f9d <_main>: 100000f9d: 48 8d 05 5c f0 ff ff leaq -4004(%rip), %rax ## 0x100000000 # Relocation refers to the beginning of the file, `__mh_execute_header`??? ```

Additional questions

There have been other similar transformations added recently: ObjC relative method lists, (etc?). Could we theoretically encounter a similar scenario there?

llvmbot commented 3 months ago

@llvm/issue-subscribers-lld-macho

Author: Daniel Bertalan (BertalanD)

Background: when linking macOS binaries with chained fixups, we need to transform initializers stored in `__mod_init_func` (an array of pointers *rebased* through the usual means at runtime) to `__init_offsets` (an array of 32-bit offsets to initializers). Normally, we only need to care about the relocations in the input `__mod_init_func` sections. A problem arises when there are also *symbols* defined inside it. We currently ignore them in LLD -- that is, we don't add them to the symbol table (since the location they point to don't exist anymore). This doesn't happen in regular binaries created by Clang, `swiftc`, `rustc`, etc., but there have been a few instances, where this led to crashes: - In [#94716](https://github.com/llvm/llvm-project/issues/94716), we see a go-generated binary (the repro file is broken and doesn't include a bunch of swift stuff from the SDK -- TODO!). Here, the symbol (`__rt0_arm64_ios_lib.ptr`) is defined inside `__mod_init_func` as a non-exported symbol; we crash when trying to add it to the symbol table (it has no corresponding output section, so we can't set `n_sect`). ``` $ lipo -thin arm64 Users/snqu/Desktop/chromium/chromium/src/ios/chrome/vpn_extension/Tun2socks.framework/Tun2socks -o Tun2socks $ $(brew --prefix llvm)/bin/llvm-ar x Tun2socks go.o $ nm go.o | grep __rt0_arm64_ios_lib\.ptr 0000000001af3a60 s __rt0_arm64_ios_lib.ptr ``` Backtrace excerpt: ``` * thread #6, stop reason = EXC_BAD_ACCESS (code=1, address=0x34) * frame #0: 0x00000001002a0894 ld64.lld`SymtabSectionImpl<lld::macho::LP64>::writeTo(this=<unavailable>, buf=<unavailable>) const at SyntheticSections.cpp:1415:50 [opt] ``` - This [Chromium bug](https://issues.chromium.org/issues/325133695) is related to the [`curl` Rust crate](https://github.com/alexcrichton/curl-rust), which [deliberately defines](https://github.com/alexcrichton/curl-rust/blob/c01261310f13c85dc70d4e8a1ef87504662a1154/src/lib.rs#L123-L151) a symbol among the initializers, apparently, to sidestep an old linker/compiler dead-stripping issue. Here, the symbol (`__RNvCsiLjxBhyzEAX_4curl9INIT_CTOR`; `curl::INIT_CTOR`) is externally visible, we crash when we try to query its address when adding it to the exports trie. Backtrace excerpt: ``` frame #3: 0x000000019cf14d20 libsystem_c.dylib`__assert_rtn + 284 frame #4: 0x00000001003053e8 ld64.lld`lld::macho::Defined::getVA() const (.cold.2) at Symbols.cpp:97:5 [opt] * frame #5: 0x0000000100289ee0 ld64.lld`lld::macho::Defined::getVA(this=0x000000014b01ca58) const at Symbols.cpp:97:5 [opt] frame #6: 0x0000000100255fe4 ld64.lld`lld::macho::TrieBuilder::sortAndBuild(llvm::MutableArrayRef<lld::macho::Symbol const*>, lld::macho::TrieNode*, unsigned long, unsigned long) [inlined] (anonymous namespace)::ExportInfo::ExportInfo(this=<unavailable>, sym=0x000000014b01ca58, imageBase=4294967296) at ExportTrie.cpp:64:21 [opt] ``` ``` $ ar x Users/hwennborg/chromium/src/third_party/rust-src/build/x86_64-apple-darwin/stage2-tools/x86_64-apple-darwin/release/deps/libcurl-f5fa2775c6309a20.rlib curl-f5fa2775c6309a20.curl.da8bc63371d5233d-cgu.09.rcgu.o $ nm curl-f5fa2775c6309a20.curl.da8bc63371d5233d-cgu.09.rcgu.o -s __DATA __mod_init_func 0000000000000d40 S __RNvCsiLjxBhyzEAX_4curl9INIT_CTOR ``` ## Fix ideas 1. Completely remove `__mod_init_func` from the list of input sections I thought my [original patch](https://github.com/llvm/llvm-project/commit/389e0a81a15ca688cf85a82d04aeaa68d18da161) would have this effect: we do not create an OutputSection for it and don't even include it in the global `inputSections` list. In reality, this is not enough; they are still added to the symbol table (as `__mod_init_func` is present in the `symbols` array, `ObjFile::parseSymbols` will reach it). (+) if we encounter a `Defined` symbol during the program's execution, we know for sure that it has an address, no need to check for a poison flag. (-) sounds a bit hackish; currently there is a one-on-one correspondence between `ObjFile::sections` and the input file's contents 2. Create a "poisoned" state for `Defined` Symbols (+) Least amount of modification for the existing code (+) There will still be an entry (though poisoned) in the symbol table, so we'll be able to emit useful warnings if someone actually refers to the symbol. NOTE: this is basically the current workaround I ended up going for, except that I use the `isLive()` mechanism from dead-stripping. 3. Some other idea? - maybe just add a few extra checks to the known crashy places to see if the symbols refer to a deleted section? sounds very fragile These mentioned workarounds only work if the symbols are not actually referenced in relocations. If they are, we get different (but equally undesirable) behaviors. ```asm ; test.s .globl _main .text _main: leaq _init_slot(%rip), %rax .section __DATA,__mod_init_func,mod_init_funcs _init_slot: .quad _main ``` <details> <summary>ld64 crash</summary> ``` ❯ clang test.s -Wl,-ld_classic ld: warning: alignment (1) of atom '_init_slot' is too small and may result in unaligned pointers 0 0x10659e807 __assert_rtn + 137 1 0x1065a79e3 ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) (.cold.1) + 35 2 0x1063fc5e4 ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) + 116 3 0x1063fd087 ld::tool::OutputFile::applyFixUps(ld::Internal&, unsigned long long, ld::Atom const*, unsigned char*) + 599 4 0x106405818 ___ZN2ld4tool10OutputFile10writeAtomsERNS_8InternalEPh_block_invoke + 504 5 0x7ff815881def _dispatch_client_callout2 + 8 6 0x7ff815893547 _dispatch_apply_invoke3 + 431 7 0x7ff815881dbc _dispatch_client_callout + 8 8 0x7ff81588304e _dispatch_once_callout + 20 9 0x7ff815892740 _dispatch_apply_invoke + 184 10 0x7ff815881dbc _dispatch_client_callout + 8 11 0x7ff8158912ca _dispatch_root_queue_drain + 871 12 0x7ff81589184f _dispatch_worker_thread2 + 152 13 0x7ff815a1fb43 _pthread_wqthread + 262 A linker snapshot was created at: /tmp/a.out-2024-06-25-091629.ld-snapshot ld: Assertion failed: (_mode == modeFinalAddress), function finalAddress, file ld.hpp, line 1413. ``` </details> <details> <summary>ld_prime broken (?) binary</summary> ``` ❯ clang test.s -Wl,-ld_new ❯ objdump -d a.out a.out: file format mach-o 64-bit x86-64 Disassembly of section __TEXT,__text: 0000000100000f9d <_main>: 100000f9d: 48 8d 05 5c f0 ff ff leaq -4004(%rip), %rax ## 0x100000000 # Relocation refers to the beginning of the file, `__mh_execute_header`??? ``` </details> ## Additional questions There have been other similar transformations added recently: ObjC relative method lists, (etc?). Could we theoretically encounter a similar scenario there?
BertalanD commented 3 months ago

Nico mentioned to me privately that maybe we could patch upstream projects to not put symbols inside __mod_init_func.