Open rui314 opened 1 year ago
I will test tomorrow to build a clang toolchain with mold as linker, and then instrument and optimize it. I will report it here.
I just tried to BOLT mold itself, compiled it with --emit-relocs
and -fno-reorder-blocks-and-partition
.
When trying to create the instrumented binary, I get a Segmentation fault.
❯ ./bolt-anything.bash
Instrument binary with llvm-bolt
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 39f01240e76678fa6385d36e7a96670677a26d65
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x1000000, offset 0x1000000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
/home/ptr1337/llvm-bolt-scripts/bolt-anything.bash: line 41: 37593 Segmentation fault ${BOLTPATH}/llvm-bolt --instrument --instrumentation-file-append-pid --instrumentation-file=${FDATA}/${BINARY}.fdata ${BINARYPATH}/${BINARY} -o ${BOLTBIN}/${BINARY}
Could not create instrumented binary
@rui314 we are also eagerly trying to combine MOLD + CLANG + PGO + LTO + BOLT... it's certainly a journey to get there for big apps.
@ptr1337 Thanks! I'd start with a "hello world" program though.
Here is a brain dump of things that might affect the compatibility with BOLT.
.eh_frame
section does not have a relocation section even if --emit-relocs
is given. We synthesize .eh_frame
, and we probably need to synthesize .rela.eh_frame
too--emit-relocs
is given is always in the RELA format even for a target with REL type. This is not a bug but a feature, as it should make binary post-processing easier. I don't think it's causing an issue, as x86-64 and aarch64 are both RELA, though..plt
and .plt.got
machine instructions are different from the ones in the psABIs.@gcflymoto Is there anything we can help?
@rui314
I think this could help you, I have just tested to bolt clang.
Instrument clang with llvm-bolt
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 39f01240e76678fa6385d36e7a96670677a26d65
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x4e00000, offset 0x4e00000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-WARNING: _etext/1 (0x490955c) does not have any section
BOLT-WARNING: etext/1 (0x490955c) does not have any section
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
/home/ptr1337/llvm-bolt-scripts/build_stage3-bolt-without-sampling.bash: line 19: 69152 Aborted ${BOLTPATH}/llvm-bolt --instrument --instrumentation-file-append-pid --instrumentation-file=${TOPLEV}/stage3-without-sampling/intrumentdata
@ishitatsuyuki Did you want to take this?
@rui314 Yes, I'm tentatively trying to get a Rust CI build to work, which uses BOLT to optimize LLVM (https://github.com/rust-lang/rust/pull/94381). I'll report if it succeeds or not.
OK, it failed:
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: <unknown>
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x5000000, offset 0x5000000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-WARNING: _etext/1 (0x4a4f843) does not have any section
BOLT-WARNING: etext/1 (0x4a4f843) does not have any section
BOLT-WARNING: split function detected on input : _ZNSi6ignoreEl.cold/1. The support is limited in relocation mode.
#0 0x0000000000958a1f PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
#1 0x0000000000956814 SignalHandler(int) Signals.cpp:0:0
#2 0x00007fef06d83db0 __restore_rt (/lib64/libc.so.6+0x59db0)
#3 0x00007fef06e0ffc0 __memmove_avx_unaligned_erms (/lib64/libc.so.6+0xe5fc0)
#4 0x000000000099ded9 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) (.constprop.0) RewriteInstance.cpp:0:0
#5 0x00000000009d959b llvm::bolt::RewriteInstance::analyzeRelocation(llvm::object::RelocationRef const&, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&, bool&, unsigned long&, long&, unsigned long&, bool&) const (/rustroot/bin/llvm-bolt+0x9d959b)
#6 0x00000000009d9e6b llvm::bolt::RewriteInstance::readRelocations(llvm::object::SectionRef const&) (/rustroot/bin/llvm-bolt+0x9d9e6b)
#7 0x00000000009dbdcd llvm::bolt::RewriteInstance::processRelocations() (.part.0) RewriteInstance.cpp:0:0
#8 0x00000000009fa56a llvm::bolt::RewriteInstance::discoverFileObjects() (/rustroot/bin/llvm-bolt+0x9fa56a)
#9 0x00000000009fbf49 llvm::bolt::RewriteInstance::run() (/rustroot/bin/llvm-bolt+0x9fbf49)
#10 0x0000000000419bab main (/rustroot/bin/llvm-bolt+0x419bab)
#11 0x00007fef06d6ee50 __libc_start_call_main (/lib64/libc.so.6+0x44e50)
#12 0x00007fef06d6eefc __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x44efc)
#13 0x000000000047baa5 _start (/rustroot/bin/llvm-bolt+0x47baa5)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: llvm-bolt -instrument /checkout/obj/build/x86_64-unknown-linux-gnu/llvm/build/lib/libLLVM-15-rust-1.66.0-nightly.so --instrumentation-file-append-pid -o /tmp/instrumented.so
I'll try to gather more notes for reproduction later, but let me know if you need something in particular.
Instrument runs fine with llvm main
so it's some fun bug that got fixed between the 15.0 and 16.0 cycle...
If we use LLVM 16, can we use BOLT with mold without doing anything special?
I still haven't tested the optimization pass (the part that is not instrumentation) yet.
I think I spoke too soon. Bisection resulted in an all-pass so this might be an environment dependent problem (build flags? silent memory errors?).
This line is failing, even in main: https://github.com/llvm/llvm-project/blob/3ee58e2f355f8fdb8e0fe29dc366c8833fafa7d3/bolt/lib/Rewrite/RewriteInstance.cpp#L1885
Lesson to me is to always turn on LLVM assertions when debugging. Without assertions this turned into a heisenbug.
I'll try to see why.
I usually build LLVM (and mold) with -DCMAKE_RELEASE_TYPE=Debug
unless I'm doing performance optimization.
Looks like this was the real problem:
llvm-bolt: /home/ishitatsuyuki/llvm-project/bolt/lib/Core/BinaryContext.cpp:753: llvm::bolt::BinaryFunction *llvm::bolt::BinaryContext::createBinaryFunction(const std::string &, llvm::bolt::BinarySection &, uint64_t, uint64_t, uint64_t, uint16_t): Assertion `Result.second == true && "unexpected duplicate function"' failed.
#0 0x000055839b75737a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:11
#1 0x000055839b75752b PrintStackTraceSignalHandler(void*) /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Unix/Signals.inc:636:1
#2 0x000055839b755b76 llvm::sys::RunSignalHandlers() /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Signals.cpp:104:5
#3 0x000055839b757c55 SignalHandler(int) /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
#4 0x00007fded65a2520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
#5 0x00007fded65f6a7c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
#6 0x00007fded65f6a7c __pthread_kill_internal ./nptl/pthread_kill.c:78:10
#7 0x00007fded65f6a7c pthread_kill ./nptl/pthread_kill.c:89:10
#8 0x00007fded65a2476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
#9 0x00007fded65887f3 abort ./stdlib/abort.c:81:7
#10 0x00007fded658871b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#11 0x00007fded6599e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#12 0x000055839d0e9105 llvm::bolt::BinaryContext::createBinaryFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, llvm::bolt::BinarySection&, unsigned long, unsigned long, unsigned long, unsigned short) /home/ishitatsuyuki/llvm-project/bolt/lib/Core/BinaryContext.cpp:754:32
#13 0x000055839b872746 llvm::bolt::RewriteInstance::createPLTBinaryFunction(unsigned long, unsigned long, unsigned long) /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1268:6
#14 0x000055839b873002 llvm::bolt::RewriteInstance::disassemblePLTSectionX86(llvm::bolt::BinarySection&, unsigned long) /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1386:3
#15 0x000055839b873220 llvm::bolt::RewriteInstance::disassemblePLT()::$_9::operator()(llvm::bolt::BinarySection&, unsigned long) const /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1394:3
#16 0x000055839b871838 llvm::bolt::RewriteInstance::disassemblePLT() /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1401:5
#17 0x000055839b86c4dc llvm::bolt::RewriteInstance::discoverFileObjects() /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1173:3
#18 0x000055839b868952 llvm::bolt::RewriteInstance::run() /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:720:3
#19 0x000055839ac9b76b main /home/ishitatsuyuki/llvm-project/bolt/tools/driver/llvm-bolt.cpp:244:17
#20 0x00007fded6589d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#21 0x00007fded6589e40 call_init ./csu/../csu/libc-start.c:128:20
#22 0x00007fded6589e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#23 0x000055839ac9a6d5 _start (./build/bin/llvm-bolt+0x2326d5)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: ./build/bin/llvm-bolt -instrument /home/ishitatsuyuki/rust/obj/build/x86_64-unknown-linux-gnu/llvm/build/lib/libLLVM-15-rust-1.66.0-nightly.so -o /tmp/out.so
Looks PLT related.
(Would love Rui's thoughts on this since I don't really know how mold's PLT differs from the psABI. FYI, the BOLT PLT scanner is here).
Here is where we create a PLT header and PLT entries: https://github.com/rui314/mold/blob/main/elf/arch-x86-64.cc#L34-L75
The biggest difference compared to the one in the psABI is that our PLT entry has only one entry point. The standard PLT entry looks like this:
jmp *GOTPLT_ETNRY_FOR_THE_SYMBOL
push PLT_INDEX
jmp PLT_HDR
Initially, the .got.plt entry for symbol foo
is set to the middle of foo
's PLT entry, so that when the PLT entry is executed for the first time, it jumps back to the push
instruction. The push
instruction sets a PLT index and jump to the beginning of the PLT header for lazy symbol resolution.
Our PLT looks like this:
endbr64
mov PLT_INDEX, %r11
jmp *GOTPLT_ENTRY_FOR_THE_SYMBOL
Initially, unlike the standard PLT entries, all .got.plt entries point to the beginning of the PLT section. We unconditionally set %r11 to the PLT index. If the PLT entry is called for the first time, the PLT header uses %r11's value to know which PLT entry to fix. If a PLT entry is already resolved, it's a dead-store to %r11, but it should be harmless.
After a close look it seems that the PLT format is actually fine since BOLT sequentially looks for the first jmp, which has the same target.
It looks like this conflicts with preexisting local symbols like foo$plt
though, which the assertion is complaining about: I need to understand why these exist and what are these supposed to be.
urgh, the foo$plt
thing is a regression after 1.5.1. I need to bisect.
I'm dumb, it was literally implemented in a7f1e20594ac6f86982ae22979e58f8fcf810900 as a feature...
So, the PLT-related assertion is a bug in BOLT, but we might want to add a flag to disable the synthesized symbols, to allow workarounding this issue.
There seems to be an unrelated regression from a7f1e20
too: some symbol's visibility gets filled with garbage value like below:
Symbol table '.dynsym' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fprintf@GLIBC_2.2.5 (2)
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND strtol@GLIBC_2.2.5 (2)
3: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND stdin@GLIBC_2.2.5 (2)
4: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND stderr@GLIBC_2.2.5 (2)
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fread@GLIBC_2.2.5 (2)
6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND feof@GLIBC_2.2.5 (2)
Symbol table '.symtab' contains 62 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000200238 0 SECTION LOCAL DEFAULT 1 .hash
2: 0000000000200278 0 SECTION LOCAL DEFAULT 2 .gnu.hash
3: 0000000000200298 0 SECTION LOCAL DEFAULT 3 .dynsym
4: 0000000000200340 0 SECTION LOCAL DEFAULT 4 .dynstr
5: 00000000002003ba 0 SECTION LOCAL DEFAULT 5 .gnu.version
6: 00000000002003c8 0 SECTION LOCAL DEFAULT 6 .gnu.version_r
7: 00000000002003e8 0 SECTION LOCAL DEFAULT 7 .rela.dyn
8: 0000000000200418 0 SECTION LOCAL DEFAULT 8 .rela.plt
9: 0000000000200478 0 SECTION LOCAL DEFAULT 9 .eh_frame
10: 00000000002004e4 0 SECTION LOCAL DEFAULT 10 .eh_frame_hdr
11: 0000000000200500 0 SECTION LOCAL DEFAULT 11 .rodata.cst
12: 0000000000200520 0 SECTION LOCAL DEFAULT 12 .rodata.str
13: 00000000002015a0 0 SECTION LOCAL DEFAULT 13 .plt
14: 0000000000201600 0 SECTION LOCAL DEFAULT 14 .text
15: 0000000000202800 0 SECTION LOCAL DEFAULT 15 .dynamic
16: 0000000000202a20 0 SECTION LOCAL DEFAULT 16 .got
17: 0000000000202a30 0 SECTION LOCAL DEFAULT 17 .relro_padding
18: 0000000000203a30 0 SECTION LOCAL DEFAULT 18 .got.plt
19: 0000000000202a20 0 OBJECT LOCAL DEFAULT 16 stdin@got
20: 0000000000202a28 0 OBJECT LOCAL DEFAULT 16 stderr@got
21: 00000000002015c0 0 FUNC LOCAL DEFAULT 13 fprintf@plt
22: 00000000002015d0 0 FUNC LOCAL DEFAULT 13 strtol@plt
23: 00000000002015e0 0 FUNC LOCAL DEFAULT 13 fread@plt
24: 00000000002015f0 0 FUNC LOCAL DEFAULT 13 feof@plt
25: 0000000000000000 0 FILE LOCAL DEFAULT ABS count.c
26: 0000000000200510 0 NOTYPE LOCAL DEFAULT 11 .LCPI0_0
27: 0000000000200500 0 NOTYPE LOCAL DEFAULT 11 .LCPI0_1
28: 0000000000200576 0 OBJECT LOCAL DEFAULT 12 .L.str
29: 0000000000200520 0 OBJECT LOCAL DEFAULT 12 .L.str.2
30: 000000000020055a 0 OBJECT LOCAL DEFAULT 12 .L.str.3
31: 0000000000200539 0 OBJECT LOCAL DEFAULT 12 .L.str.1
32: 0000000000201600 508 FUNC LOCAL DEFAULT 14 main
33: 0000000000200000 0 NOTYPE LOCAL DEFAULT 1 __ehdr_start
34: 0000000000000000 0 NOTYPE LOCAL DEFAULT 21 __init_array_start
35: 0000000000000000 0 NOTYPE LOCAL DEFAULT 21 __init_array_end
36: 0000000000000000 0 NOTYPE LOCAL DEFAULT 21 __fini_array_start
37: 0000000000000000 0 NOTYPE LOCAL DEFAULT 21 __fini_array_end
38: 0000000000000000 0 NOTYPE LOCAL DEFAULT 21 __preinit_array_start
39: 0000000000000000 0 NOTYPE LOCAL DEFAULT 21 __preinit_array_end
40: 0000000000202800 0 NOTYPE LOCAL DEFAULT 15 _DYNAMIC
41: 0000000000203a30 0 NOTYPE LOCAL DEFAULT 18 _GLOBAL_OFFSET_TABLE_
42: 00000000002015a0 0 NOTYPE LOCAL DEFAULT 13 _PROCEDURE_LINKAGE_TABLE_
43: 0000000000000000 0 NOTYPE LOCAL DEFAULT 21 __bss_start
44: 0000000000203a68 0 NOTYPE LOCAL DEFAULT 18 _end
45: 00000000002017fc 0 NOTYPE LOCAL DEFAULT 14 _etext
46: 0000000000203a68 0 NOTYPE LOCAL DEFAULT 18 _edata
47: 0000000000200000 0 NOTYPE LOCAL DEFAULT 1 __executable_start
48: 0000000000000000 0 NOTYPE LOCAL DEFAULT 21 __rela_iplt_start
49: 0000000000000000 0 NOTYPE LOCAL DEFAULT 21 __rela_iplt_end
50: 00000000002004e4 0 NOTYPE LOCAL DEFAULT 10 __GNU_EH_FRAME_HDR
51: 0000000000203a68 0 NOTYPE LOCAL DEFAULT 18 end
52: 00000000002017fc 0 NOTYPE LOCAL DEFAULT 14 etext
53: 0000000000203a68 0 NOTYPE LOCAL DEFAULT 18 edata
54: 0000000000200000 0 NOTYPE LOCAL DEFAULT 1 __dso_handle
55: 0000000000000000 0 NOTYPE LOCAL DEFAULT 1 _TLS_MODULE_BASE_
56: 0000000000000000 0 NOTYPE GLOBAL DEFAULT [<other>: 20] UND fprintf
57: 0000000000000000 0 NOTYPE GLOBAL DEFAULT [<other>: 34] UND strtol
58: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND stdin
59: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND stderr
60: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND fread
61: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND feof
(the [<other>: xx]
part)
A repro suite is attached. repro.tar.zip
Instead of adding a command line option to mold to suppress the linker-synthesized symbols, I'd fix BOLT. BOLT should not be affected by these symbols.
I can't reproduce the symbol visibility issue with git head, so maybe it's already fixed?
Instead of adding a command line option to mold to suppress the linker-synthesized symbols, I'd fix BOLT. BOLT should not be affected by these symbols.
Ack. Will file a bug report later.
I can't reproduce the symbol visibility issue with git head, so maybe it's already fixed?
Bisect says it's gone after https://github.com/rui314/mold/commit/d2b7b04be30b2459eceb75c3797e9725e1605f2c. But that commit doesn't really touch visibility, and it looks like the exact corrupted symbols differs between builds, so I wonder if there's some nondeterminism involved here.
After switching to v1.5.1 which doesn't emit the $plt symbols, BOLT is now complaining values calculated from relocations doesn't match the ones inside the executable:
llvm-bolt: /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1972: bool llvm::bolt::RewriteInstance::analyzeRelocation(const llvm::object::RelocationRef &, uint64_t, std::string &, bool &, uint64_t &, int64_t &, uint64_t &, bool &) const: Assertion `verifyExtractedValue() && "mismatched extracted relocation value"' failed.
readelf
also complains about invalid symbol indices, so I'm suspecting --emit-relocs
isn't working quite as expected. I'll report when I have a reduced reproduction sample.
readelf: Error: bad symbol index: 0002f885 in reloc
readelf: Error: bad symbol index: 0002f886 in reloc
readelf: Error: bad symbol index: 0002f887 in reloc
readelf: Error: bad symbol index: 0002f888 in reloc
readelf: Error: bad symbol index: 0002f889 in reloc
readelf: Error: bad symbol index: 0002f888 in reloc
readelf: Error: bad symbol index: 0002f88a in reloc
readelf: Error: bad symbol index: 0002f888 in reloc
readelf: Error: bad symbol index: 0002f88b in reloc
...
https://github.com/rui314/mold/commit/1124322ce85a17bc137f804c63eace3b794aa31d helps: BOLT now progress through a few symbols before hitting assert, and readelf errors are down to 5 entries. However something still seems to be wrong.
Edit: I probably know why: https://github.com/rui314/mold/blob/288cd5b4007a0a20da9aab52ab56067ad46866e8/elf/passes.cc#L1063 seems to need an update too.
With #798 and a hack to disable $plt
-like symbols (in https://github.com/ishitatsuyuki/mold/tree/bolt) BOLT is working flawlessly :tada: :tada: :tada:
I used Rust's PGO build pipeline, which has full ThinLTO+PGO+BOLT for its LLVM component. I had to do a few quick hacks, namely changing the base distribution to get some decent support of C++20. The diff is available at https://github.com/ishitatsuyuki/rust/tree/moldBOLT.
The resulting binary should also be running fine since the pipeline uses the just-built compiler to compile the ecosystem tools (Cargo, rust-analyzer etc.), but I'll just verify with a icache heatmap on my Intel laptop later, just in case.
Great work! It's impressive that you identified and resolved the issue so quickly. Thank you!
@rui314 @ishitatsuyuki
Great work! I just tested to instrument clang, and it has worked. Gathering now the profile, then the optimization process will be done. So far it seems to be working well.
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 39f01240e76678fa6385d36e7a96670677a26d65
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x4e00000, offset 0x4e00000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-WARNING: _etext/1 (0x490955c) does not have any section
BOLT-WARNING: etext/1 (0x490955c) does not have any section
BOLT-WARNING: Failed to analyze 470109 relocations
BOLT-WARNING: 6 collisions detected while hashing binary objects. Use -v=1 to see the list.
BOLT-INFO: 0 out of 97833 functions in the binary (0.0%) have non-empty execution profile
BOLT-INFO: the input contains 17357 (dynamic count : 0) opportunities for macro-fusion optimization that are going to be fixed
BOLT-INSTRUMENTER: Number of indirect call site descriptors: 28238
BOLT-INSTRUMENTER: Number of indirect call target descriptors: 94233
BOLT-INSTRUMENTER: Number of function descriptors: 94233
BOLT-INSTRUMENTER: Number of branch counters: 904843
BOLT-INSTRUMENTER: Number of ST leaf node counters: 443070
BOLT-INSTRUMENTER: Number of direct call counters: 0
BOLT-INSTRUMENTER: Total number of counters: 1347913
BOLT-INSTRUMENTER: Total size of counters: 10783304 bytes (static alloc memory)
BOLT-INSTRUMENTER: Total size of string table emitted: 11389554 bytes in file
BOLT-INSTRUMENTER: Total size of descriptors: 96355088 bytes in file
BOLT-INSTRUMENTER: Profile will be saved to file /home/ptr1337/toolchain/llvm/stage3-without-sampling/intrumentdata/clang-15.fdata
BOLT-INFO: 1134971 instructions were shortened
BOLT-INFO: removed 109 empty blocks
BOLT-INFO: merged 1 duplicate CFG edge
BOLT-INFO: UCE removed 838 blocks and 51318 bytes of code.
BOLT-INFO: SCTC: patched 0 tail calls (0 forward) tail calls (0 backward) from a total of 0 while removing 0 double jumps and removing 0 basic blocks totalling 0 bytes of code. CTCs total execution count is 0 and the number of times CTCs are taken is 0.
BOLT-INFO: output linked against instrumentation runtime library, lib entry point is 0xc3944c0
BOLT-INFO: clear procedure is 0xc38f360
BOLT-INFO: setting _end to 0xc514c9c
The only thing, which seems different then with lld to me is:
BOLT-WARNING: _etext/1 (0x490955c) does not have any section
BOLT-WARNING: etext/1 (0x490955c) does not have any section
Same as this, seems to me a bit high:
BOLT-WARNING: Failed to analyze 470109 relocations
Thanks!
edit:
Instrumentation, merging fdata and the optimization was successful! Here the log:
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 39f01240e76678fa6385d36e7a96670677a26d65
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling lite mode
BOLT-WARNING: _etext/1 (0x490955c) does not have any section
BOLT-WARNING: etext/1 (0x490955c) does not have any section
BOLT-WARNING: Failed to analyze 470109 relocations
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-WARNING: 1 collisions detected while hashing binary objects. Use -v=1 to see the list.
BOLT-INFO: 10210 out of 97833 functions in the binary (10.4%) have non-empty execution profile
BOLT-INFO: 649 functions with profile could not be optimized
BOLT-WARNING: 5 (0.0% of all profiled) functions have invalid (possibly stale) profile. Use -report-stale to see the list.
BOLT-WARNING: 460262 out of 243184719344 samples in the binary (0.0%) belong to functions with invalid (possibly stale) profile.
BOLT-INFO: profile for 1 objects was ignored
BOLT-INFO: the input contains 4674 (dynamic count : 3071330706) opportunities for macro-fusion optimization. Will fix instances on a hot path.
BOLT-INFO: 354969 instructions were shortened
BOLT-INFO: removed 851 empty blocks
BOLT-INFO: ICF folded 1478 out of 98109 functions in 4 passes. 0 functions had jump tables.
BOLT-INFO: Removing all identical functions will save 166.21 KB of code space. Folded functions were called 3122835745 times based on profile.
BOLT-INFO: 31975 PLT calls in the binary were optimized.
BOLT-INFO: basic block reordering modified layout of 4922 functions (48.21% of profiled, 5.09% of total)
BOLT-INFO: UCE removed 1 blocks and 6 bytes of code.
BOLT-INFO: splitting separates 4610217 hot bytes from 3731975 cold bytes (55.26% of split functions is hot).
BOLT-INFO: 129 Functions were reordered by LoopInversionPass
BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:
138489556757 : executed forward branches
20942286787 : taken forward branches
46989998144 : executed backward branches
28605377103 : taken backward branches
9641011040 : executed unconditional branches
14576806007 : all function calls
7293536017 : indirect calls
3769079744 : PLT calls
1222503556165 : executed instructions
298324511206 : executed load instructions
120845030651 : executed store instructions
0 : taken jump table branches
0 : taken unknown indirect branches
195120565941 : total branches
59188674930 : taken branches
135931891011 : non-taken conditional branches
49547663890 : taken conditional branches
185479554901 : all conditional branches
124400786448 : executed forward branches (-10.2%)
8624897609 : taken forward branches (-58.8%)
61078768453 : executed backward branches (+30.0%)
28657226155 : taken backward branches (+0.2%)
6073827710 : executed unconditional branches (-37.0%)
10807726263 : all function calls (-25.9%)
7293536017 : indirect calls (=)
0 : PLT calls (-100.0%)
1209826144728 : executed instructions (-1.0%)
298324511206 : executed load instructions (=)
120845030651 : executed store instructions (=)
0 : taken jump table branches (=)
0 : taken unknown indirect branches (=)
191553382611 : total branches (-1.8%)
43355951474 : taken branches (-26.7%)
148197431137 : non-taken conditional branches (+9.0%)
37282123764 : taken conditional branches (-24.8%)
185479554901 : all conditional branches (=)
BOLT-INFO: SCTC: patched 39 tail calls (37 forward) tail calls (2 backward) from a total of 39 while removing 4 double jumps and removing 31 basic blocks totalling 155 bytes of code. CTCs total execution count is 2168704 and the number of times CTCs are taken is 1081892.
BOLT-INFO: setting _end to 0x619ea10
BOLT-INFO: setting __hot_start to 0x4e00000
BOLT-INFO: setting __hot_end to 0x57411d6
@ishitatsuyuki You may want to test BOLT with a program that uses exceptions, as clang and (I believe) rustc don't use exceptions. That part might not be tested yet.
Our .eh_frame section does not have a relocation section even if --emit-relocs is given. We synthesize .eh_frame, and we probably need to synthesize .rela.eh_frame too
BOLT doesn't need .rela.eh_frame
as it creates internal relocations while parsing the section, i.e. kinda synthesizes .rela.eh_frame
internally.
BOLT doesn't need .rela.eh_frame as it creates internal relocations while parsing the section, i.e. kinda synthesizes .rela.eh_frame internally.
Oh cool, then we don't need to synthsize .rela.eh_frame. Exceptions could fail for other reasons, so it's nice to test it though.
The BOLT-WARNING: Failed to analyze 470109 relocations
is because BOLT don't know how to read the GOT32
relocations; it's probably relaxed by bfd but not mold. It seems harmless so far but I haven't checked in detail.
@ishitatsuyuki I'd file that to BOLT even if it's harmless.
@ishitatsuyuki You may want to test BOLT with a program that uses exceptions, as clang and (I believe) rustc don't use exceptions. That part might not be tested yet.
The real-life example of BOLT seems really scarce, so I don't have a good idea what to test. I'll run BOLT's own test suite in LLVM with mold which has some CFI tests.
@ishitatsuyuki I'd file that to BOLT even if it's harmless.
Ack.
Maybe @maksfb has an idea as to what programs are suitable for testing BOLT?
BOLT test suite has several runnable x86 tests for C++ exceptions. E.g. https://github.com/llvm/llvm-project/blob/main/bolt/test/runtime/X86/exceptions-run.test
For large open-source project with C++ exceptions, I can recommend trying HHVM (hhvm.com). I've also heard that ScyllaDB uses BOLT, but I have no first-hand experience with it.
I ran the BOLT test suite with mold (-DBOLT_LLD_EXE=/path/to/mold
):
Failed Tests (10):
Logs for X86/dynrelocs.s:
BOLT-WARNING: _etext/1 (0x20139d) does not have any section
BOLT-WARNING: etext/1 (0x20139d) does not have any section
llvm-bolt: /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:2595: void llvm::bolt::RewriteInstance::handleRelocation(const llvm::object::SectionRef &, const llvm::object::RelocationRef &): Assertion `(IsAArch64 || IsSectionRelocation || BD->nameStartsWith(SymbolName) || BD->nameStartsWith("PG" + SymbolName) || (BD->nameStartsWith("ANONYMOUS") && (BD->getSectionName().startswith(".plt") || BD->getSectionName().endswith(".plt")))) && "BOLT symbol names of all non-section relocations must match up " "with symbol names referenced in the relocation"' failed.
#0 0x000056123b12837a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:11
#1 0x000056123b12852b PrintStackTraceSignalHandler(void*) /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Unix/Signals.inc:636:1
#2 0x000056123b126b76 llvm::sys::RunSignalHandlers() /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Signals.cpp:104:5
#3 0x000056123b128c55 SignalHandler(int) /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
#4 0x00007ff3d0334520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
#5 0x00007ff3d0388a7c __pthread_kill_implementation ./nptl/./nptl/pthread_kill.c:44:76
#6 0x00007ff3d0388a7c __pthread_kill_internal ./nptl/./nptl/pthread_kill.c:78:10
#7 0x00007ff3d0388a7c pthread_kill ./nptl/./nptl/pthread_kill.c:89:10
#8 0x00007ff3d0334476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
#9 0x00007ff3d031a7f3 abort ./stdlib/./stdlib/abort.c:81:7
#10 0x00007ff3d031a71b _nl_load_domain ./intl/./intl/loadmsgcat.c:1177:9
#11 0x00007ff3d032be96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#12 0x000056123b1f03b4 llvm::bolt::RewriteInstance::handleRelocation(llvm::object::SectionRef const&, llvm::object::RelocationRef const&) /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:2588:7
#13 0x000056123b1ed645 llvm::bolt::RewriteInstance::readRelocations(llvm::object::SectionRef const&) /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:2329:33
#14 0x000056123b1ea978 llvm::bolt::RewriteInstance::processRelocations() /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:2007:3
#15 0x000056123b1e507f llvm::bolt::RewriteInstance::discoverFileObjects() /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1240:1
#16 0x000056123b1e0ec2 llvm::bolt::RewriteInstance::run() /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:720:3
#17 0x000056123a7340fb main /home/ishitatsuyuki/llvm-project/bolt/tools/driver/llvm-bolt.cpp:244:17
#18 0x00007ff3d031bd90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#19 0x00007ff3d031be40 call_init ./csu/../csu/libc-start.c:128:20
#20 0x00007ff3d031be40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#21 0x000056123a733065 _start (/home/ishitatsuyuki/llvm-project/build/bin/llvm-bolt+0x260065)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: /home/ishitatsuyuki/llvm-project/build/bin/llvm-bolt -data /home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.tmp.fdata /home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.tmp.exe -relocs -o /home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.tmp.out -lite=0 -jump-tables=move
/home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.script: line 8: 749269 Aborted (core dumped) /home/ishitatsuyuki/llvm-project/build/bin/llvm-bolt -data /home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.tmp.fdata /home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.tmp.exe -relocs -o /home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.tmp.out -lite=0 -jump-tables=move
Testing Time: 22.61s
Unsupported: 26
Passed : 194
Failed : 10
@rui314 Any idea where a GOT32 reloc can come from? You can see it in e.g. the emit-relocs test.
I readelf'd all the input .o and .a files but they didn't seem to contain any. A quick grep of mold code don't show any uses that assigns the value of GOT32, so I don't have a good clue here.
In bfd, the symbols seems to get assigned a PC64 reloc instead, although I'm not 100% sure it's pointing to the same thing.
Finally there's a chance that GOT32 is just a bogus value we assign due to some bug. I see a bunch of these for .debug_lines too.
It looks like mold creates GOT32 relocations for section symbols. We discard section symbols in input files and re-create new section symbols for output sections, as we want to keep only one section symbol for each section.
I don't know why it lead to creating GOT32 relocations, though. But the relocation type looks bogus.
OK, this is the culprit then :sweat_smile:
buf[j].r_type = STT_SECTION;
(STT_SECTION = R_X86_64_GOT32)
LOL
With #833 BOLT :: X86/data-to-data-pcrel.s
is fixed. The remaining 9 still fails and there still seems to be "relocation parse failed" messages (Update: gone after rebuild). I'll check what BOLT is complaning about this time.
There are still 9 failures but I think we're mostly done for it now. Reasons it's failing seems to be due to one of: 1. mold defaulting to LOCAL for executables (need --export-dynamic
), 2. mold generating more symbols that have the same address as others (probably harmless, just a different name), and 3. straight out hardcoded expectations. I'll test HHVM next time.
Facebook/Meta's BOLT is becoming popular. I don't know if BOLT works on mold's output. We need to test it and fix issues if any.