Open romanovj opened 6 months ago
I guess you use the highly experimental azanella/clang branch of glibc that works towards compatiblity with clang. I've tested that some weeks ago and found some bugs when using it, it isn't there just yet.
I can't tell myself if the issue you described here is a Clang, a BOLT or a Glibc problem, but maybe @zatrazz can whom I pinged for awareness and might have tried using BOLT on Glibc himself.
@ms178 glibc 2.39, gcc 13.2.1, llvm-bolt 18.1.1
more details about second error:
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: `8c3b2f419a0eb5d5b4702568beee561b740e0b08`
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x200000, offset 0x200000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-WARNING: sizes differ for function __clone3/1. FDE : 27; symbol table : 71. Using max size.
BOLT-WARNING: sizes differ for function clone3/1. FDE : 27; symbol table : 71. Using max size.
BOLT-WARNING: sizes differ for function __GI___clone3/1. FDE : 27; symbol table : 71. Using max size.
BOLT-WARNING: sizes differ for function __GI___clone/1. FDE : 52; symbol table : 95. Using max size.
BOLT-WARNING: sizes differ for function clone. FDE : 52; symbol table : 95. Using max size.
BOLT-WARNING: sizes differ for function __clone. FDE : 52; symbol table : 95. Using max size.
BOLT-ERROR: function __restore_rt/1 is in conflict with FDE [9322f, 93239). Skipping.
BOLT-WARNING: sizes differ for function __setcontext/1. FDE : 332; symbol table : 352. Using max size.
BOLT-WARNING: sizes differ for function setcontext. FDE : 332; symbol table : 352. Using max size.
BOLT-WARNING: FDE [0x324e5, 0x324f6) conflicts with function __clone3/1(*3)
BOLT-WARNING: FDE [0x324f6, 0x32507) conflicts with function __clone3/1(*3)
BOLT-WARNING: FDE [0x3263e, 0x3264e) conflicts with function __GI___clone/1(*3)
BOLT-WARNING: FDE [0x3264e, 0x3265f) conflicts with function __GI___clone/1(*3)
BOLT-WARNING: FDE [0xe8eec, 0xe8f00) conflicts with function __setcontext/1(*2)
BOLT-ERROR: symbol seen in the middle of the function __BOLT_FDE_FUNCat9322f. Skipping.
BOLT-ERROR: cannot find BB containing branch destination.
=======================================
BOLT is unable to proceed because it couldn't properly understand this function.
If you are running the most recent version of BOLT, you may want to report this and paste this dump.
Please check that there is no sensitive contents being shared in this dump.
Offending function: ____longjmp_chk/1
Function contents (
0000: F30F1EFA 4C8B4730 4C8B4F08 488B5738 |....L.G0L.O.H.W8|
0010: 49C1C811 644C3304 25300000 0049C1C9 |I...dL3.%0...I..|
0020: 11644C33 0C253000 000048C1 CA116448 |.dL3.%0...H...dH|
0030: 33142530 0000004C 39C47651 4989FA89 |3.%0...L9.vQI...|
0040: F331FF48 8D7424E8 B8830000 000F0585 |.1.H.t$.........|
0050: C07535F7 4424F001 00000074 14488B44 |.u5.D$.....t.H.D|
0060: 24E84803 4424F84C 29C0483B 4424F873 |$.H.D$.L).H;D$.s|
0070: 174883EC 08488D3D 058D0E00 E88FEFFB |.H...H.=........|
0080: FF050505 05050505 4C89D789 DE64F704 |........L....d..|
0090: 25480000 00020000 00745EF3 480F1EC8 |%H.......t^.H...|
00A0: 4989C248 8B4F5848 29C8744D 4989CB48 |I..H.OXH).tMI..H|
00B0: 8B59F848 83E3F848 39CB740B 4883E908 |.Y.H...H9.t.H...|
00C0: 4939CA75 EAEB11F3 0F0169F8 F30F01EA |I9.u......i.....|
00D0: F3480F1E C84C29D8 48F7D848 C1E80348 |.H...L).H..H...H|
00E0: 83C001BB FF000000 4839D848 0F42D8F3 |........H9.H.B..|
00F0: 480FAEEB 4829D877 EF90488B 1F4C8B67 |H...H).w..H..L.g|
0100: 104C8B6F 184C8B77 204C8B7F 2889F04C |.L.o.L.w L..(..L|
0110: 89C44C89 CD90FFE2 0F1F8400 00000000 |..L.............|
)
Binary Function "____longjmp_chk/1" {
Number : 1596
State : disassembled
Address : 0x97450
Size : 0x118
MaxSize : 0x120
Offset : 0x97450
Section : .text
Orc Section : .local.text.____longjmp_chk/1
LSDA : 0x0
IsSimple : 1
IsMultiEntry: 0
IsSplit : 0
BB Count : 14
CFI Instrs : 16
}
DWARF CFI Instructions:
0000003f: OpRegister Reg5 Reg10
00000041: OpRegister Reg4 Reg3
00000075: OpRememberState
00000075: OpDefCfaOffset 16
00000081: OpRestoreState
0000008b: OpRestore Reg5
0000008d: OpRestore Reg4
000000f9: OpDefCfa Reg5 0
000000f9: OpRegister Reg7 Reg8
000000f9: OpRegister Reg6 Reg9
000000f9: OpRegister Reg16 Reg1
000000f9: OpOffset Reg3 0
000000f9: OpOffset Reg12 16
000000f9: OpOffset Reg13 24
000000f9: OpOffset Reg14 32
000000f9: OpOffset Reg15 40
End of Function "____longjmp_chk/1"
ERROR: disassembly failed - inconsistent branch found.
=======================================
LLVM ERROR: pthread_join failed: Resource deadlock avoided
#0 0x000057fcfab0d5c0 (/usr/bin/llvm-bolt+0x16ce5c0)
#1 0x000057fcfab0b240 (/usr/bin/llvm-bolt+0x16cc240)
#2 0x000057fcfab0de7b (/usr/bin/llvm-bolt+0x16cee7b)
#3 0x00007754026ed980 (/usr/bin/../lib/libc.so.6+0x39980)
#4 0x000077540273b0ec (/usr/bin/../lib/libc.so.6+0x870ec)
#5 0x00007754026ed8e4 __GI_raise (/usr/bin/../lib/libc.so.6+0x398e4)
#6 0x00007754026d830b __GI_abort (/usr/bin/../lib/libc.so.6+0x2430b)
#7 0x000057fcfaab9f3c (/usr/bin/llvm-bolt+0x167af3c)
#8 0x000057fcfab0eb05 (/usr/bin/llvm-bolt+0x16cfb05)
#9 0x000057fcfab0eb31 (/usr/bin/llvm-bolt+0x16cfb31)
#10 0x000057fcfabe56d6 (/usr/bin/llvm-bolt+0x17a66d6)
#11 0x000057fcfb13625c (/usr/bin/llvm-bolt+0x1cf725c)
#12 0x00007754026ef924 __run_exit_handlers (/usr/bin/../lib/libc.so.6+0x3b924)
#13 0x00007754026efa6a (/usr/bin/../lib/libc.so.6+0x3ba6a)
#14 0x000057fcfb0d32aa (/usr/bin/llvm-bolt+0x1c942aa)
#15 0x000057fcfb0f4716 (/usr/bin/llvm-bolt+0x1cb5716)
#16 0x000057fcfabb7871 (/usr/bin/llvm-bolt+0x1778871)
#17 0x000057fcfb13715f (/usr/bin/llvm-bolt+0x1cf815f)
#18 0x000057fcfabdccb6 (/usr/bin/llvm-bolt+0x179dcb6)
#19 0x000057fcfabe514f (/usr/bin/llvm-bolt+0x17a614f)
#20 0x000057fcfabe5b3b (/usr/bin/llvm-bolt+0x17a6b3b)
#21 0x00007754027394ca (/usr/bin/../lib/libc.so.6+0x854ca)
#22 0x00007754027b0e08 __GI___clone3 (/usr/bin/../lib/libc.so.6+0xfce08)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
after editing glibc/sysdeps/x86_64/longjmp.S (replcaing asm code for longjmp with nop)
llvm-bolt libc.so -o libc.so.bolt --instrument --instrumentation-file=/tmp/libc.so --instrumentation-file-append-pid
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 8c3b2f419a0eb5d5b4702568beee561b740e0b08
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x200000, offset 0x200000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-ERROR: function __restore_rt/1 is in conflict with FDE [3b19f, 3b1a9). Skipping.
BOLT-WARNING: sizes differ for function __setcontext/1. FDE : 332; symbol table : 352. Using max size.
BOLT-WARNING: sizes differ for function setcontext. FDE : 332; symbol table : 352. Using max size.
BOLT-WARNING: sizes differ for function __GI___clone/1. FDE : 52; symbol table : 95. Using max size.
BOLT-WARNING: sizes differ for function clone. FDE : 52; symbol table : 95. Using max size.
BOLT-WARNING: sizes differ for function __clone. FDE : 52; symbol table : 95. Using max size.
BOLT-WARNING: sizes differ for function __clone3/1. FDE : 27; symbol table : 71. Using max size.
BOLT-WARNING: sizes differ for function clone3/1. FDE : 27; symbol table : 71. Using max size.
BOLT-WARNING: sizes differ for function __GI___clone3/1. FDE : 27; symbol table : 71. Using max size.
BOLT-WARNING: FDE [0x4000c, 0x40020) conflicts with function __setcontext/1(*2)
BOLT-WARNING: FDE [0x106ece, 0x106ede) conflicts with function __GI___clone/1(*3)
BOLT-WARNING: FDE [0x106ede, 0x106eef) conflicts with function __GI___clone/1(*3)
BOLT-WARNING: FDE [0x107055, 0x107066) conflicts with function __clone3/1(*3)
BOLT-WARNING: FDE [0x107066, 0x107077) conflicts with function __clone3/1(*3)
BOLT-ERROR: symbol seen in the middle of the function __BOLT_FDE_FUNCat3b19f. Skipping.
BOLT-INFO: 0 out of 3757 functions in the binary (0.0%) have non-empty execution profile
BOLT-INFO: the input contains 468 (dynamic count : 0) opportunities for macro-fusion optimization that are going to be fixed
BOLT-INSTRUMENTER: Number of indirect call site descriptors: 2833
BOLT-INSTRUMENTER: Number of indirect call target descriptors: 3733
BOLT-INSTRUMENTER: Number of function descriptors: 3691
BOLT-INSTRUMENTER: Number of branch counters: 40615
BOLT-INSTRUMENTER: Number of ST leaf node counters: 28597
BOLT-INSTRUMENTER: Number of direct call counters: 352
BOLT-INSTRUMENTER: Total number of counters: 69564
BOLT-INSTRUMENTER: Total size of counters: 556512 bytes (static alloc memory)
BOLT-INSTRUMENTER: Total size of string table emitted: 71639 bytes in file
BOLT-INSTRUMENTER: Total size of descriptors: 3586856 bytes in file
BOLT-INSTRUMENTER: Profile will be saved to file /tmp/libc.so
BOLT-INFO: 27956 instructions were shortened
BOLT-INFO: removed 551 empty blocks
BOLT-INFO: merged 3 duplicate CFG edges
BOLT-INFO: removed 1 'repz' prefixes with estimated execution count of 0 times.
BOLT-INFO: UCE removed 8425 blocks and 550438 bytes of code
BOLT-INFO: padding code to 0x800000 to accommodate hot text
BOLT-INFO: output linked against instrumentation runtime library, lib entry point is 0x8dd9d0
BOLT-INFO: clear procedure is 0x8d9350
BOLT-INFO: setting __bolt_runtime_start to 0x8dd980
BOLT-INFO: setting __bolt_runtime_fini to 0x8dd9d0
BOLT-INFO: setting __hot_start to 0x400000
BOLT-INFO: setting __hot_end to 0x702e88
BOLT-INFO: patched build-id (flipped last bit)
BOLT-ERROR: Offset overflow for dynamic relocation
@romanovj I am just an interested user with no programming skills, I might not be of much help in debugging this further and leave that to the experts.
But you mentioned using GCC 13.2.1, llvm-bolt 18.1.1 and glibc 2.39 - you could try to get further with the mentioned azanella/clang branch, if you are on Arch Linux the source section in the PKGBUILD needs to be modified to: source=("git+https://sourceware.org/git/glibc.git#branch=azanella/clang"
With that tree you could even try to use Clang as the main compiler. But be aware that there are still some issues with that branch.
I would suggest using bughunter script to try to track down the function that has this dynamic relocation (see example in https://llvm.org/devmtg/2024-03/slides/practical-use-of-bolt.pdf). If successful, you can skip that function with -skip-funcs=<funcname>.*
.
If bughunter in function searching mode wouldn't help, you can try -max-data-relocations
but I'm not entirely sure if that dynamic relocation is for code or data. But please check and let us know.
@aaupov Nothing was found with bughunter.
This didn't help either-max-funcs=0 -max-data-relocations=0
I guess you use the highly experimental azanella/clang branch of glibc that works towards compatiblity with clang. I've tested that some weeks ago and found some bugs when using it, it isn't there just yet.
I would be interested to know what kind of bugs you have found using my branch. Are these bolt, clang, or glibc related?
I can't tell myself if the issue you described here is a Clang, a BOLT or a Glibc problem, but maybe @zatrazz can whom I pinged for awareness and might have tried using BOLT on Glibc himself.
I haven't tried to use yet, and I am not sure how to interpret the potential issues the log is showing. The 'Offset overflow for dynamic relocation' error for __longjmp.S should not be affect by the compiler, so I am not sure why kind of error BOLT is accusing here. We did have some tests that stress the symbols and on dynamic loader to acuse of potential overflow on dynamic relocations.
The reporter seems ot be using a quite recent glibc version, is he using the new '-z mark-plt'? I recall that we recently fixes some for displacement overflow, albeit it is only for x32.
I guess you use the highly experimental azanella/clang branch of glibc that works towards compatiblity with clang. I've tested that some weeks ago and found some bugs when using it, it isn't there just yet.
I would be interested to know what kind of bugs you have found using my branch. Are these bolt, clang, or glibc related?
I will try to reproduce it with a recent snapshot. It could have been a mold issue actually: https://github.com/rui314/mold/issues/1213. I also use some Clear Linux and Mandriva patches on top and one patch did not applying cleanly on your glibc-git based branch which might have caused some differences.
Indeed, it seems to be a mold 2.4.1 specific issue, with a clang-compiled glibc, I saw:
mold: warning: /usr/lib/crt1.o: ignoring .llvm_addrsig section without sh_link; was the file processed by strip or objcopy -r?
mold: warning: /usr/lib/libc_nonshared.a(atexit.oS): ignoring .llvm_addrsig section without sh_link; was the file processed by strip or objcopy -r?
mold: warning: /usr/lib/libc_nonshared.a(pthread_atfork.oS): ignoring .llvm_addrsig section without sh_link; was the file processed by strip or objcopy -r?
mold: warning: /usr/lib/libc_nonshared.a(stack_chk_fail_local.oS): ignoring .llvm_addrsig section without sh_link; was the file processed by strip or objcopy -r?
mold: warning: /usr/lib/libc_nonshared.a(at_quick_exit.oS): ignoring .llvm_addrsig section without sh_link; was the file processed by strip or objcopy -r?
But with mold 2.30 these warnings were silenced and don't occur any longer.
The other bugs which I've noticed were build issues due to me using some fancy compiler flags with LLVM/Clang. If you want I could list some of the offending flags here (as I don't have an account for glibc bugzilla).
Right, I don't have mold on my loop, I constantly test my branch with binutils and lld. And yes, it would be helpful to know the possible flags that might interfere with glibc build. Keep in mind that we have some strickly compiler flags requirements and we use/filter some depending of the TU (for instance, -fstack-protector where some loader TU can not be built with it because the SSP cookie is not initialized yet).
@zatrazz Here are some examples: -fno-semantic-interposition
and -Wl,-Bsymbolic-functions
lead to segfaults and runtime issues but that's also the case when using GCC.
-fdata-sections -ffunction-sections
and -Wl,--gc-sections
on the linker side lead to an error during the configure stage: configure: error: --enable-multi-arch support requires assembler and linker support
Here is a link to the full glibc configuration that I use (with some OpenMandriva and Clear Linux patches on top): https://github.com/ms178/archpkgbuilds/blob/main/toolchain-stable/glibc/PKGBUILD.clang
The following flags...
export CC=clang
export CXX=clang++
export CC_LD=lld
export CXX_LD=lld
export AR=llvm-ar
export NM=llvm-nm
export STRIP=llvm-strip
export OBJCOPY=llvm-objcopy
export OBJDUMP=llvm-objdump
export READELF=llvm-readelf
export RANLIB=llvm-ranlib
export HOSTCC=clang
export HOSTCXX=clang++
export HOSTAR=llvm-ar
export CPPFLAGS="-D_FORTIFY_SOURCE=0"
export CFLAGS="-O3 -march=native -mtune=native -mllvm -inline-threshold=1500 -mllvm -extra-vectorizer-passes -mllvm -enable-cond-stores-vec -mllvm -slp-vectorize-hor-store -mllvm -enable-loopinterchange -mllvm -enable-loop-distribute -mllvm -enable-unroll-and-jam -mllvm -enable-loop-flatten -mllvm -unroll-runtime-multi-exit -mllvm -aggressive-ext-opt -mllvm -enable-interleaved-mem-accesses -mllvm -enable-masked-interleaved-mem-accesses -fno-math-errno -fno-trapping-math -falign-functions=32 -funroll-loops -fcf-protection=none -mharden-sls=none -fomit-frame-pointer -mprefer-vector-width=256 -mllvm -adce-remove-loops -mllvm -enable-ext-tsp-block-placement -mllvm -enable-gvn-hoist -mllvm -enable-dfa-jump-thread -Wno-error -ffp-contract=fast -fsplit-machine-functions -fgnuc-version=6.5.0 -w"
export CXXFLAGS="${CFLAGS} -Wp,-U_GLIBCXX_ASSERTIONS"
export LDFLAGS="-Wl,--lto-CGO3 -Wl,--icf=all -Wl,--lto-O3,-O3,--as-needed -fcf-protection=none -mharden-sls=none -Wl,-mllvm -Wl,-extra-vectorizer-passes -Wl,-mllvm -Wl,-enable-cond-stores-vec -Wl,-mllvm -Wl,-slp-vectorize-hor-store -Wl,-mllvm -Wl,-enable-loopinterchange -Wl,-mllvm -Wl,-enable-loop-distribute -Wl,-mllvm -Wl,-enable-unroll-and-jam -Wl,-mllvm -Wl,-enable-loop-flatten -Wl,-mllvm -Wl,-unroll-runtime-multi-exit -Wl,-mllvm -Wl,-aggressive-ext-opt -Wl,-mllvm -Wl,-enable-interleaved-mem-accesses -Wl,-mllvm -Wl,-enable-masked-interleaved-mem-accesses -march=native -maes -mbmi2 -mpclmul -fuse-ld=lld -Wl,-zmax-page-size=0x200000 -Wl,-mllvm -Wl,-adce-remove-loops -Wl,-mllvm -Wl,-enable-ext-tsp-block-placement -Wl,-mllvm -Wl,-enable-gvn-hoist -Wl,-mllvm -Wl,-enable-dfa-jump-thread -Wl,--push-state -Wl,-whole-archive -ljemalloc_pic -Wl,--pop-state -lpthread -lstdc++ -lm -ldl"
export CCLDFLAGS="$LDFLAGS"
export CXXLDFLAGS="$LDFLAGS"
export ASFLAGS="-D__AVX__=1 -D__AVX2__=1 -D__FMA__=1"
... lead to a configure warning:
checking for assembler and linker STT_GNU_IFUNC support... llvm-readelf: warning: 'conftest': unable to parse DT_JMPREL: virtual address is not in any segment: 0x0
llvm-readelf: warning: 'conftest': unable to parse DT_JMPREL: virtual address is not in any segment: 0x0
yes
On the other hand, I am able to use fairly aggressive flags for the math part (see https://github.com/ms178/archpkgbuilds/blob/main/toolchain-stable/glibc/mathlto.patch.clang) and the following set of flags in /etc/makepkg.conf
:
export CC=clang
export CXX=clang++
export CC_LD=lld
export CXX_LD=lld
export AR=llvm-ar
export NM=llvm-nm
export STRIP=llvm-strip
export OBJCOPY=llvm-objcopy
export OBJDUMP=llvm-objdump
export READELF=llvm-readelf
export RANLIB=llvm-ranlib
export HOSTCC=clang
export HOSTCXX=clang++
export HOSTAR=llvm-ar
export CPPFLAGS="-D_FORTIFY_SOURCE=0"
export CFLAGS="-O3 -march=native -mtune=native -maes -mbmi2 -mpclmul -mllvm -inline-threshold=1500 -mllvm -extra-vectorizer-passes -mllvm -enable-cond-stores-vec -mllvm -slp-vectorize-hor-store -mllvm -enable-loopinterchange -mllvm -enable-loop-distribute -mllvm -enable-unroll-and-jam -mllvm -enable-loop-flatten -mllvm -unroll-runtime-multi-exit -mllvm -aggressive-ext-opt -mllvm -enable-interleaved-mem-accesses -mllvm -enable-masked-interleaved-mem-accesses -fno-math-errno -fno-trapping-math -falign-functions=32 -funroll-loops -fomit-frame-pointer -mprefer-vector-width=256 -mllvm -adce-remove-loops -mllvm -enable-ext-tsp-block-placement -mllvm -enable-gvn-hoist -mllvm -enable-dfa-jump-thread -fcf-protection=none -mharden-sls=none -fgnuc-version=6.5.0"
export CXXFLAGS="${CFLAGS} -Wp,-U_GLIBCXX_ASSERTIONS"
export LDFLAGS="-Wl,-O3,--as-needed -Wl,-mllvm -Wl,-extra-vectorizer-passes -Wl,-mllvm -Wl,-enable-cond-stores-vec -Wl,-mllvm -Wl,-slp-vectorize-hor-store -Wl,-mllvm -Wl,-enable-loopinterchange -Wl,-mllvm -Wl,-enable-loop-distribute -Wl,-mllvm -Wl,-enable-unroll-and-jam -Wl,-mllvm -Wl,-enable-loop-flatten -Wl,-mllvm -Wl,-unroll-runtime-multi-exit -Wl,-mllvm -Wl,-aggressive-ext-opt -Wl,-mllvm -Wl,-enable-interleaved-mem-accesses -Wl,-mllvm -Wl,-enable-masked-interleaved-mem-accesses -march=native -maes -mbmi2 -mpclmul -fuse-ld=lld -Wl,-zmax-page-size=0x200000 -Wl,-mllvm -Wl,-adce-remove-loops -Wl,-mllvm -Wl,-enable-ext-tsp-block-placement -Wl,-mllvm -Wl,-enable-gvn-hoist -Wl,-mllvm -Wl,-enable-dfa-jump-thread -Wl,--undefined-version -fcf-protection=none -mharden-sls=none"
export CCLDFLAGS="$LDFLAGS"
export CXXLDFLAGS="$LDFLAGS"
export ASFLAGS="-D__AVX__=1 -D__AVX2__=1 -D__FMA__=1"
@zatrazz Here are some examples:
-fno-semantic-interposition
and-Wl,-Bsymbolic-functions
lead to segfaults and runtime issues but that's also the case when using GCC.
Thanks, it a really interesting testcase you have here.
There is no need to use -fno-semantic-interposition
or -Wl,-Bsymbolic-functions
, glibc takes care to not add intra PLT calls with a set on internal tricks (hidden_proto/hidden_def macros), and it also has regressions tests to check for the unexpected cases. In fact I think this would be wrong because it would require to add a dynamic symbol file to export some symbols that expected to be called through PLT (like malloc, matherr, and __tls_get_addr).
-fdata-sections -ffunction-sections
and-Wl,--gc-sections
on the linker side lead to an error during the configure stage:configure: error: --enable-multi-arch support requires assembler and linker support
Here is a link to the full glibc configuration that I use (with some OpenMandriva and Clear Linux patches on top): https://github.com/ms178/archpkgbuilds/blob/main/toolchain-stable/glibc/PKGBUILD.clang
I think the problem is passing such options through $CC and not through $CFLAGS. Using on CFLAGS/CXXFLAGS I could build with both gcc and clang without any issue (you will need to pass an optimization level though, due the loader bootstrap limitation).
Also the '--without-cvs', '--disable-dependency-tracking', '--disable-silent-rules', '--enable-omitfp', '--enable-nss-crypt', '--disable-sanity-checks' are outdate/inexistent options.
The following flags...
export CC=clang export CXX=clang++ export CC_LD=lld export CXX_LD=lld export AR=llvm-ar export NM=llvm-nm export STRIP=llvm-strip export OBJCOPY=llvm-objcopy export OBJDUMP=llvm-objdump export READELF=llvm-readelf export RANLIB=llvm-ranlib export HOSTCC=clang export HOSTCXX=clang++ export HOSTAR=llvm-ar export CPPFLAGS="-D_FORTIFY_SOURCE=0" export CFLAGS="-O3 -march=native -mtune=native -mllvm -inline-threshold=1500 -mllvm -extra-vectorizer-passes -mllvm -enable-cond-stores-vec -mllvm -slp-vectorize-hor-store -mllvm -enable-loopinterchange -mllvm -enable-loop-distribute -mllvm -enable-unroll-and-jam -mllvm -enable-loop-flatten -mllvm -unroll-runtime-multi-exit -mllvm -aggressive-ext-opt -mllvm -enable-interleaved-mem-accesses -mllvm -enable-masked-interleaved-mem-accesses -fno-math-errno -fno-trapping-math -falign-functions=32 -funroll-loops -fcf-protection=none -mharden-sls=none -fomit-frame-pointer -mprefer-vector-width=256 -mllvm -adce-remove-loops -mllvm -enable-ext-tsp-block-placement -mllvm -enable-gvn-hoist -mllvm -enable-dfa-jump-thread -Wno-error -ffp-contract=fast -fsplit-machine-functions -fgnuc-version=6.5.0 -w" export CXXFLAGS="${CFLAGS} -Wp,-U_GLIBCXX_ASSERTIONS" export LDFLAGS="-Wl,--lto-CGO3 -Wl,--icf=all -Wl,--lto-O3,-O3,--as-needed -fcf-protection=none -mharden-sls=none -Wl,-mllvm -Wl,-extra-vectorizer-passes -Wl,-mllvm -Wl,-enable-cond-stores-vec -Wl,-mllvm -Wl,-slp-vectorize-hor-store -Wl,-mllvm -Wl,-enable-loopinterchange -Wl,-mllvm -Wl,-enable-loop-distribute -Wl,-mllvm -Wl,-enable-unroll-and-jam -Wl,-mllvm -Wl,-enable-loop-flatten -Wl,-mllvm -Wl,-unroll-runtime-multi-exit -Wl,-mllvm -Wl,-aggressive-ext-opt -Wl,-mllvm -Wl,-enable-interleaved-mem-accesses -Wl,-mllvm -Wl,-enable-masked-interleaved-mem-accesses -march=native -maes -mbmi2 -mpclmul -fuse-ld=lld -Wl,-zmax-page-size=0x200000 -Wl,-mllvm -Wl,-adce-remove-loops -Wl,-mllvm -Wl,-enable-ext-tsp-block-placement -Wl,-mllvm -Wl,-enable-gvn-hoist -Wl,-mllvm -Wl,-enable-dfa-jump-thread -Wl,--push-state -Wl,-whole-archive -ljemalloc_pic -Wl,--pop-state -lpthread -lstdc++ -lm -ldl"
Some options are not really tested, but for most I won't expected failures if compiler does not change the ABI (such as -Wl,-slp-vectorize-hor-store). However some of them I don't expected to be supported, not without a lot of hacks, such as LTO (https://sourceware.org/bugzilla/show_bug.cgi?id=15658); or adding a malloc implementation with a specific ABI (the jemalloc_pic) along with statically linking libc with libstdc++.
Also, the math library is build and tested with some especifc math flags (-ffp-contract=fast/-fno-trapping-math is not supported and might lead to a lot of regression is testing).
Could you check with a more restricted CFLAGS to narrow down the required support to enable BOLT? Trying to support such extensive flags selection might require a lot of extra unrelated work.
export CCLDFLAGS="$LDFLAGS" export CXXLDFLAGS="$LDFLAGS" export ASFLAGS="-DAVX=1 -DAVX2=1 -DFMA=1"
... lead to a configure warning:
checking for assembler and linker STT_GNU_IFUNC support... llvm-readelf: warning: 'conftest': unable to parse DT_JMPREL: virtual address is not in any segment: 0x0 llvm-readelf: warning: 'conftest': unable to parse DT_JMPREL: virtual address is not in any segment: 0x0 yes
On the other hand, I am able to use fairly aggressive flags for the math part (see https://github.com/ms178/archpkgbuilds/blob/main/toolchain-stable/glibc/mathlto.patch.clang) and the following set of flags in `/etc/makepkg.conf`:
export CC=clang export CXX=clang++ export CC_LD=lld export CXX_LD=lld export AR=llvm-ar export NM=llvm-nm export STRIP=llvm-strip export OBJCOPY=llvm-objcopy export OBJDUMP=llvm-objdump export READELF=llvm-readelf export RANLIB=llvm-ranlib export HOSTCC=clang export HOSTCXX=clang++ export HOSTAR=llvm-ar export CPPFLAGS="-D_FORTIFY_SOURCE=0" export CFLAGS="-O3 -march=native -mtune=native -maes -mbmi2 -mpclmul -mllvm -inline-threshold=1500 -mllvm -extra-vectorizer-passes -mllvm -enable-cond-stores-vec -mllvm -slp-vectorize-hor-store -mllvm -enable-loopinterchange -mllvm -enable-loop-distribute -mllvm -enable-unroll-and-jam -mllvm -enable-loop-flatten -mllvm -unroll-runtime-multi-exit -mllvm -aggressive-ext-opt -mllvm -enable-interleaved-mem-accesses -mllvm -enable-masked-interleaved-mem-accesses -fno-math-errno -fno-trapping-math -falign-functions=32 -funroll-loops -fomit-frame-pointer -mprefer-vector-width=256 -mllvm -adce-remove-loops -mllvm -enable-ext-tsp-block-placement -mllvm -enable-gvn-hoist -mllvm -enable-dfa-jump-thread -fcf-protection=none -mharden-sls=none -fgnuc-version=6.5.0" export CXXFLAGS="${CFLAGS} -Wp,-U_GLIBCXX_ASSERTIONS" export LDFLAGS="-Wl,-O3,--as-needed -Wl,-mllvm -Wl,-extra-vectorizer-passes -Wl,-mllvm -Wl,-enable-cond-stores-vec -Wl,-mllvm -Wl,-slp-vectorize-hor-store -Wl,-mllvm -Wl,-enable-loopinterchange -Wl,-mllvm -Wl,-enable-loop-distribute -Wl,-mllvm -Wl,-enable-unroll-and-jam -Wl,-mllvm -Wl,-enable-loop-flatten -Wl,-mllvm -Wl,-unroll-runtime-multi-exit -Wl,-mllvm -Wl,-aggressive-ext-opt -Wl,-mllvm -Wl,-enable-interleaved-mem-accesses -Wl,-mllvm -Wl,-enable-masked-interleaved-mem-accesses -march=native -maes -mbmi2 -mpclmul -fuse-ld=lld -Wl,-zmax-page-size=0x200000 -Wl,-mllvm -Wl,-adce-remove-loops -Wl,-mllvm -Wl,-enable-ext-tsp-block-placement -Wl,-mllvm -Wl,-enable-gvn-hoist -Wl,-mllvm -Wl,-enable-dfa-jump-thread -Wl,--undefined-version -fcf-protection=none -mharden-sls=none" export CCLDFLAGS="$LDFLAGS" export CXXLDFLAGS="$LDFLAGS" export ASFLAGS="-DAVX=1 -DAVX2=1 -DFMA=1"
Yeah, the math library is a more straightforward library since the code uses less of glibc internal tricks to support some glibc specific cases (such as internal alias to PLT avoidance, bootstrap code for the loader, etc.).
@zatrazz Thanks a lot for your insights!
I think the problem is passing such options through $CC and not through $CFLAGS. Using on CFLAGS/CXXFLAGS I could build with both gcc and clang without any issue (you will need to pass an optimization level though, due the loader bootstrap limitation).
Could you please guide me how I could change that as makepkg might set some variables in the background that I haven't thought about yet? I've tried to ignore all of the flags in /etc/makepkg.conf and setting the CFLAGS/CXXFLAGS via the PKGBUILD, but that doesn't change the outcome when using ffunction-section and related flags.
@romanovj Pardon me for hijacking the thread with some issues of my own. I'd be interested to replicate your issue with BOLT. Do you have a PKGBUILD or a glibc-specific step-by-step guide which I could follow? Do you gather profiles with specific workloads to ensure good profile quality?
Do you have a PKGBUILD or a glibc-specific step-by-step guide which I could follow?
https://gitlab.archlinux.org/archlinux/packaging/packages/glibc
also minimal config:
../glibc/configure \
--prefix=/root/workdir/install \
--host=x86_64-linux-gnu \
--build=x86_64-linux-gnu \
CC="gcc -m64" \
CXX="g++ -m64" \
CFLAGS="-O2" \
CXXFLAGS="-O2"
With gcc or clang(azanella/clang)
Plus -Wl,--emit-relocs and -fno-reorder-blocks-and-partition only for GCC
@romanovj Thanks, but don't you need to instrument with BOLT and gather profiles first?
At least that's common practice, e.g. with LLVM/Clang: https://github.com/ms178/archpkgbuilds/blob/main/toolchain-experimental/llvm-bolt-scripts-master/build_stage3-bolt-without-sampling.bash
@ms178 can't add instrumentation
llvm-bolt libc.so -o libc.so.bolt --instrument --instrumentation-file=/tmp/libc.so --instrumentation-file-append-pid
......
BOLT-ERROR: Offset overflow for dynamic relocation
@romanovj I am afraid, but I think it is a bit more complicated than that.
Here is my second non-working attempt for a PKGBUILD (you can delete the custom patches that I apply):
# Maintainer: Marcus Seyfarth <marcus85@gmx.de>
pkgbase=glibc
pkgname=(glibc lib32-glibc)
pkgver=2.39
pkgrel=16.1
pkgdesc='GNU C Library'
arch=('x86_64')
url='https://www.gnu.org/software/libc'
license=('GPL' 'LGPL')
depends=('linux-api-headers' 'tzdata')
makedepends=('git' 'gd' 'python' 'lib32-gcc-libs')
optdepends=('perl: for mtrace'
'gd: graph image generation with memusage')
backup=(etc/gai.conf
etc/locale.gen
etc/nscd.conf)
options=('staticlibs' '!lto' 'buildflags')
install=glibc.install
source=("git+https://sourceware.org/git/glibc.git#branch=azanella/clang"
locale-gen
locale.gen.txt
lib32-glibc.conf
malloc_tune.patch
#mathlto.patch
tzselect-proper-zone-file.patch
04-mandriva-va_args.patch
05-mandriva-zstdcompressedlocals.patch
06-mandriva-nss-crash.patch
07-mandriva-nostrictaliasing.patch
nptl.patch
)
sha256sums=('SKIP'
)
prepare() {
mkdir -p glibc-build lib32-glibc-build
[[ -d glibc-$pkgver ]] && ln -s glibc-$pkgver glibc
local src
for src in "${source[@]}"; do
src="${src%%::*}"
src="${src##*/}"
[[ $src = *.patch ]] || continue
echo "Applying patch $src..."
patch --directory="glibc" --forward --strip=1 < "$src"
done
}
build() {
cd "$srcdir/glibc-build"
echo "slibdir=/usr/lib" >> configparms
echo "rtlddir=/usr/lib" >> configparms
echo "sbindir=/usr/bin" >> configparms
echo "rootsbindir=/usr/bin" >> configparms
CFLAGS=${CFLAGS/-Wp,-D_FORTIFY_SOURCE=2/}
"$srcdir/glibc/configure" \
--prefix=/usr \
--libdir=/usr/lib \
--libexecdir=/usr/lib \
--with-headers=/usr/include \
--disable-bind-now \
--without-selinux \
--disable-fortify-source \
--disable-systemtap \
--disable-cet \
--enable-kernel=6.8.1 \
--enable-multi-arch \
--disable-profile \
--disable-crypt \
--disable-werror
echo "build-programs=no" >> configparms
make -O
sed -i "/build-programs=/s#no#yes#" configparms
echo "CFLAGS += -Wp,-D_FORTIFY_SOURCE=0" >> configparms
make -O
# Instrument Glibc with BOLT
echo "Instrumenting Glibc with BOLT"
llvm-bolt --lite=false \
--instrument \
--instrumentation-file-append-pid \
--instrumentation-file="$srcdir/glibc-build/bolt-output/libc.so.fdata" \
"$srcdir/glibc-build/libc.so" \
-o "$srcdir/glibc-build/libc.so.inst"
echo "Moving instrumented Glibc binary"
mv "$srcdir/glibc-build/libc.so" "$srcdir/glibc-build/libc.so.org"
mv "$srcdir/glibc-build/libc.so.inst" "$srcdir/glibc-build/libc.so"
# Gather profiles with the Glibc test suite
make check
# Optimize Glibc with BOLT using the collected profile
echo "Optimizing Glibc with BOLT"
llvm-bolt -o "$srcdir/glibc-build/bolt-output/libc.so" \
--data "$srcdir/glibc-build/bolt-output/libc.so.fdata" \
"$srcdir/glibc-build/libc.so.org" \
-reorder-blocks=ext-tsp \
-reorder-functions=cdsort \
-split-functions \
-split-all-cold \
-split-eh \
-dyno-stats \
-icf=1 \
-lite=0
echo "Replacing original Glibc binary with optimized one"
mv "$srcdir/glibc-build/libc.so" "$srcdir/glibc-build/libc.so.orig"
mv "$srcdir/glibc-build/bolt-output/libc.so" "$srcdir/glibc-build/libc.so"
cd "$srcdir/lib32-glibc-build"
export CC="clang -m32 -mfpmath=sse -mstackrealign"
export CXX="clang++ -m32 -mfpmath=sse -mstackrealign"
echo "slibdir=/usr/lib32" >> configparms
echo "rtlddir=/usr/lib32" >> configparms
echo "sbindir=/usr/bin" >> configparms
echo "rootsbindir=/usr/bin" >> configparms
"$srcdir/glibc/configure" \
--host=i686-pc-linux-gnu \
--prefix=/usr \
--libdir=/usr/lib32 \
--libexecdir=/usr/lib32 \
--disable-cet \
--enable-kernel=6.8.1 \
--disable-bind-now \
--without-selinux \
--disable-fortify-source \
--disable-systemtap \
--disable-profile \
--disable-crypt \
--disable-sanity-checks \
--disable-werror \
"${_configure_flags[@]}"
echo "build-programs=no" >> configparms
make -O
sed -i "/build-programs=/s#no#yes#" configparms
echo "CFLAGS += -Wp,-D_FORTIFY_SOURCE=0" >> configparms
make -O
# Instrument 32-bit Glibc with BOLT
echo "Instrumenting 32-bit Glibc with BOLT"
llvm-bolt --lite=false \
--instrument \
--instrumentation-file-append-pid \
--instrumentation-file="$srcdir/lib32-glibc-build/bolt-output/libc.so.fdata" \
"$srcdir/lib32-glibc-build/libc.so" \
-o "$srcdir/lib32-glibc-build/libc.so.inst"
echo "Moving instrumented 32-bit Glibc binary"
mv "$srcdir/lib32-glibc-build/libc.so.6" "$srcdir/lib32-glibc-build/libc.so.org"
mv "$srcdir/lib32-glibc-build/libc.so.6.inst" "$srcdir/lib32-glibc-build/libc.so"
# Gather profiles with the Glibc test suite
make check
# Optimize 32-bit Glibc with BOLT using the collected profile
echo "Optimizing 32-bit Glibc with BOLT"
llvm-bolt -o "$srcdir/lib32-glibc-build/bolt-output/libc.so" \
--data "$srcdir/lib32-glibc-build/bolt-output/libc.so.fdata" \
"$srcdir/lib32-glibc-build/libc.so.org" \
-reorder-blocks=ext-tsp \
-reorder-functions=cdsort \
-split-functions \
-split-all-cold \
-split-eh \
-dyno-stats \
-icf=1 \
-lite=0
echo "Replacing original 32-bit Glibc binary with optimized one"
mv "$srcdir/lib32-glibc-build/libc.so" "$srcdir/lib32-glibc-build/elf/libc.so.orig"
mv "$srcdir/lib32-glibc-build/bolt-output/libc.so.6" "$srcdir/lib32-glibc-build/elf/libc.so"
elf/ld.so --library-path "$PWD" locale/localedef -c -f ../glibc/localedata/charmaps/UTF-8 -i ../glibc/localedata/locales/C ../C.UTF-8/
}
package_glibc() {
pkgdesc='GNU C Library'
depends=('linux-api-headers>=4.10' tzdata filesystem)
optdepends=('gd: for memusagestat'
'perl: for mtrace')
install=glibc.install
backup=(etc/gai.conf
etc/locale.gen
etc/nscd.conf)
make -C glibc-build install_root="$pkgdir" install
rm -f "$pkgdir"/etc/ld.so.cache
# Shipped in tzdata
rm -f "$pkgdir"/usr/bin/{tzselect,zdump,zic}
cd glibc
install -dm755 "$pkgdir"/usr/lib/{locale,systemd/system,tmpfiles.d}
install -m644 nscd/nscd.conf "$pkgdir/etc/nscd.conf"
install -m644 nscd/nscd.service "$pkgdir/usr/lib/systemd/system"
install -m644 nscd/nscd.tmpfiles "$pkgdir/usr/lib/tmpfiles.d/nscd.conf"
install -dm755 "$pkgdir/var/db/nscd"
install -m644 posix/gai.conf "$pkgdir"/etc/gai.conf
install -m755 "$srcdir/locale-gen" "$pkgdir/usr/bin"
# Create /etc/locale.gen
install -m644 "$srcdir/locale.gen.txt" "$pkgdir/etc/locale.gen"
sed -e '1,3d' -e 's|/| |g' -e 's|\\| |g' -e 's|^|#|g' \
"$srcdir/glibc/localedata/SUPPORTED" >> "$pkgdir/etc/locale.gen"
# install C.UTF-8 so that it is always available
install -dm755 "$pkgdir/usr/lib/locale"
cp -r "$srcdir/C.UTF-8" -t "$pkgdir/usr/lib/locale"
sed -i '/#C\.UTF-8 /d' "$pkgdir/etc/locale.gen"
# Install the optimized libc.so.6
install -m755 "$srcdir/glibc-build/elf/libc.so.6" "$pkgdir/usr/lib/libc.so.6"
}
package_lib32-glibc() {
pkgdesc='GNU C Library (32-bit)'
depends=("glibc=$pkgver")
options+=('!emptydirs')
cd lib32-glibc-build
make install_root="$pkgdir" install
rm -rf "$pkgdir"/{etc,sbin,usr/{bin,sbin,share},var}
# We need to keep 32 bit specific header files
find "$pkgdir/usr/include" -type f -not -name '*-32.h' -delete
# Dynamic linker
install -d "$pkgdir/usr/lib"
ln -s ../lib32/ld-linux.so.2 "$pkgdir/usr/lib/"
# Add lib32 paths to the default library search path
install -Dm644 "$srcdir/lib32-glibc.conf" "$pkgdir/etc/ld.so.conf.d/lib32-glibc.conf"
# Symlink /usr/lib32/locale to /usr/lib/locale
ln -s ../lib/locale "$pkgdir/usr/lib32/locale"
# Install the optimized 32-bit libc.so.6
install -m755 "$srcdir/lib32-glibc-build/elf/libc.so.6" "$pkgdir/usr/lib32/libc.so.6"
}
I get this output:
Instrumenting Glibc with BOLT
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: fa4cc39255767bbaf63a6a3b445dc94b43ebd447
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0xa00000, offset 0xa00000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-ERROR: bad input binary, global symbol "sys_nerr" is not unique
First error: no DT_FINI || DT_FINI_ARRAY. Fixed with creating it
Second error: can't disassemble function __longjmp_chk (asm code of this function in sysdeps/x86_64/longjmp.S) Fixed by replcaing asm code with nop =)
Third error: BOLT-ERROR: Offset overflow for dynamic relocation
I don't know how to fix it.