llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.13k stars 12.02k forks source link

llvm-bolt crashed #59415

Closed uttampawar closed 1 year ago

uttampawar commented 1 year ago

Environment:

$ ./llvm-bolt --version
LLVM (http://llvm.org/):
  LLVM version 14.0.1
  Optimized build with assertions.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: Xeon
  OS: RHEL v8.x

BOLT revision c62053979489ccb002efe411c3af059addcb5d7d
  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_32 - AArch64 (little endian ILP32)
    aarch64_be - AArch64 (big endian)
    arm64      - ARM64 (little endian)
    arm64_32   - ARM64 (little endian ILP32)
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64
$  ./llvm-bolt a.out -instrument -o a.out-llvm-instrumented
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: c62053979489ccb002efe411c3af059addcb5d7d
BOLT-INFO: first alloc address is 0x400000
BOLT-INFO: creating new program header table at address 0x800000, offset 0x400000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-ERROR: function __restore_rt/1 is in conflict with FDE [4b9fdf, 4b9fe9). Skipping.
BOLT-WARNING: sizes differ for function setcontext/1. FDE : 129; symbol table : 132. Using max size.
BOLT-WARNING: sizes differ for function __setcontext/1. FDE : 129; symbol table : 132. Using max size.
BOLT-INFO: using __nanosleep_nocancel/1 as another entry to function __libc_nanosleep/1(*6)
BOLT-INFO: using __open_nocancel/1 as another entry to function __open/1(*6)
BOLT-INFO: using __close_nocancel/1 as another entry to function __libc_close/1(*6)
BOLT-INFO: using __read_nocancel/1 as another entry to function __read/1(*6)
BOLT-INFO: using __write_nocancel/1 as another entry to function __write/1(*6)
BOLT-INFO: using __fsync_nocancel/1 as another entry to function __libc_fsync/1(*4)
BOLT-INFO: using __msync_nocancel/1 as another entry to function msync/1(*4)
BOLT-INFO: using __lseek_nocancel/1 as another entry to function __llseek/1(*16)
BOLT-INFO: using __connect_nocancel/1 as another entry to function connect/1(*8)
BOLT-WARNING: .annobin_atexit.c_end/1 (0x518142) does not have any section
BOLT-WARNING: .annobin_atexit.end/1 (0x518142) does not have any section
BOLT-WARNING: _etext (0x519355) does not have any section
BOLT-WARNING: FDE [0x4bc0b1, 0x4bc0b4) conflicts with function setcontext/1(*4)
BOLT-ERROR: symbol seen in the middle of the function __BOLT_FDE_FUNCat4b9fdf. Skipping.
BOLT-WARNING: Failed to analyze 13 relocations
BOLT-WARNING: Ignored 0 functions due to cold fragments.
llvm-bolt: /home/user/llvm-project/bolt/include/bolt/Core/BinaryFunction.h:1711: void llvm::bolt::BinaryFunction::addCFIInstruction(uint64_t, llvm::MCCFIInstruction&&): Assertion `I->first == Offset && "CFI pointing to unknown instruction"' failed.
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
./llvm-bolt(+0xc0dd8f)[0x563e376b5d8f]
./llvm-bolt(+0xc0b98d)[0x563e376b398d]
/lib64/libpthread.so.0(+0x12ce0)[0x7f0405b30ce0]
/lib64/libc.so.6(gsignal+0x10f)[0x7f0404c74a9f]
/lib64/libc.so.6(abort+0x127)[0x7f0404c47e05]
/lib64/libc.so.6(+0x21cd9)[0x7f0404c47cd9]
/lib64/libc.so.6(+0x473f6)[0x7f0404c6d3f6]
./llvm-bolt(+0x1a189ef)[0x563e384c09ef]
./llvm-bolt(+0x1a18b48)[0x563e384c0b48]
./llvm-bolt(+0x1a1b368)[0x563e384c3368]
./llvm-bolt(+0x9ce71c)[0x563e3747671c]
./llvm-bolt(+0xa2daf0)[0x563e374d5af0]
./llvm-bolt(+0x338661)[0x563e36de0661]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7f0404c60cf3]
./llvm-bolt(+0x3a23ea)[0x563e36e4a3ea]
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ./llvm-bolt a.out -instrument -o a.out-llvm-instrumented
Aborted (core dumped)
llvmbot commented 1 year ago

@llvm/issue-subscribers-bolt

aaupov commented 1 year ago

What's the compiler used? Can you please provide a reproducer?

uttampawar commented 1 year ago

@aaupov Here is the compiler details, BTW, llvm build was done on Ubuntu 18.04, and used to instrument a program on RHEL 8.x $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) I don't have a small test case to re-produce this error at this time.

uttampawar commented 1 year ago

@aaupov Here are the llvm=bolt execution and application build environment details,

$ uname -a Linux node008 4.18.0-372.26.1.el8_6.x86_64 #1 SMP Sat Aug 27 02:44:20 EDT

$ ldd --version ldd (GNU libc) 2.28

$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/9.2.0/lto-wrappeTarget: x86_64-pc-linux-gnu Configured with: ../gcc-9.2.0/configure --disable-multilib --enable-languages=c,c++ Thread model: posix gcc version 9.2.0 (GCC)

Application build flags, CFLAGS: "-O3 -g -fno-reorder-blocks-and-partition -std=c++11" LDFLAGS: "-Wl,--emit-relocs,-znow"

aaupov commented 1 year ago

I would guess that some function has incorrect CFI information encoded in .eh_frame, perhaps written in assembly. As a workaround, you can exclude the problematic function with -skip-funcs. In order to find which function it is, you can try bughunter.sh script, or use debug build of BOLT and print getPrintName() in the crashing frame (it should be in BinaryFunction method).

uttampawar commented 1 year ago

@aaupov Thanks. I'll try that out. Thanks

aaupov commented 1 year ago

The issue appears to be mitigated. Otherwise please reopen and provide extra details.