Open BertalanD opened 3 months ago
@llvm/issue-subscribers-lld-macho
Author: Daniel Bertalan (BertalanD)
Nico mentioned to me privately that maybe we could patch upstream projects to not put symbols inside __mod_init_func
.
curl
crate, it's definitely possible. There's the https://github.com/mmastrac/rust-ctor crate which didn't crash LLD even before my changes.init_array
section (ELF-like section names are fixed up later). Or could we special-case this section to not generate symbol table entries?
Background: when linking macOS binaries with chained fixups, we need to transform initializers stored in
__mod_init_func
(an array of pointers rebased through the usual means at runtime) to__init_offsets
(an array of 32-bit offsets to initializers).Normally, we only need to care about the relocations in the input
__mod_init_func
sections. A problem arises when there are also symbols defined inside it. We currently ignore them in LLD -- that is, we don't add them to the symbol table (since the location they point to don't exist anymore). This doesn't happen in regular binaries created by Clang,swiftc
,rustc
, etc., but there have been a few instances, where this led to crashes:In #94716, we see a go-generated binary (the repro file is broken and doesn't include a bunch of swift stuff from the SDK -- TODO!). Here, the symbol (
__rt0_arm64_ios_lib.ptr
) is defined inside__mod_init_func
as a non-exported symbol; we crash when trying to add it to the symbol table (it has no corresponding output section, so we can't setn_sect
).Backtrace excerpt:
This Chromium bug is related to the
curl
Rust crate, which deliberately defines a symbol among the initializers, apparently, to sidestep an old linker/compiler dead-stripping issue. Here, the symbol (__RNvCsiLjxBhyzEAX_4curl9INIT_CTOR
;curl::INIT_CTOR
) is externally visible, we crash when we try to query its address when adding it to the exports trie.Backtrace excerpt:
Fix ideas
Completely remove
__mod_init_func
from the list of input sectionsI thought my original patch would have this effect: we do not create an OutputSection for it and don't even include it in the global
inputSections
list.In reality, this is not enough; they are still added to the symbol table (as
__mod_init_func
is present in thesymbols
array,ObjFile::parseSymbols
will reach it).(+) if we encounter a
Defined
symbol during the program's execution, we know for sure that it has an address, no need to check for a poison flag.(-) sounds a bit hackish; currently there is a one-on-one correspondence between
ObjFile::sections
and the input file's contentsCreate a "poisoned" state for
Defined
Symbols(+) Least amount of modification for the existing code
(+) There will still be an entry (though poisoned) in the symbol table, so we'll be able to emit useful warnings if someone actually refers to the symbol.
NOTE: this is basically the current workaround I ended up going for, except that I use the
isLive()
mechanism from dead-stripping.Some other idea?
These mentioned workarounds only work if the symbols are not actually referenced in relocations. If they are, we get different (but equally undesirable) behaviors.
ld64 crash
``` ❯ clang test.s -Wl,-ld_classic ld: warning: alignment (1) of atom '_init_slot' is too small and may result in unaligned pointers 0 0x10659e807 __assert_rtn + 137 1 0x1065a79e3 ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) (.cold.1) + 35 2 0x1063fc5e4 ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) + 116 3 0x1063fd087 ld::tool::OutputFile::applyFixUps(ld::Internal&, unsigned long long, ld::Atom const*, unsigned char*) + 599 4 0x106405818 ___ZN2ld4tool10OutputFile10writeAtomsERNS_8InternalEPh_block_invoke + 504 5 0x7ff815881def _dispatch_client_callout2 + 8 6 0x7ff815893547 _dispatch_apply_invoke3 + 431 7 0x7ff815881dbc _dispatch_client_callout + 8 8 0x7ff81588304e _dispatch_once_callout + 20 9 0x7ff815892740 _dispatch_apply_invoke + 184 10 0x7ff815881dbc _dispatch_client_callout + 8 11 0x7ff8158912ca _dispatch_root_queue_drain + 871 12 0x7ff81589184f _dispatch_worker_thread2 + 152 13 0x7ff815a1fb43 _pthread_wqthread + 262 A linker snapshot was created at: /tmp/a.out-2024-06-25-091629.ld-snapshot ld: Assertion failed: (_mode == modeFinalAddress), function finalAddress, file ld.hpp, line 1413. ```ld_prime broken (?) binary
``` ❯ clang test.s -Wl,-ld_new ❯ objdump -d a.out a.out: file format mach-o 64-bit x86-64 Disassembly of section __TEXT,__text: 0000000100000f9d <_main>: 100000f9d: 48 8d 05 5c f0 ff ff leaq -4004(%rip), %rax ## 0x100000000 # Relocation refers to the beginning of the file, `__mh_execute_header`??? ```Additional questions
There have been other similar transformations added recently: ObjC relative method lists, (etc?). Could we theoretically encounter a similar scenario there?