llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.13k stars 12.01k forks source link

[BOLT] Assertion `!Frag->isSimple() && "fragment of non-simple function should also be non-simple"' failed. #76800

Open llongint opened 10 months ago

llongint commented 10 months ago

I can reproduce with the simple command below:

llvm-bolt gaussdb -o test --data=tpch.arm.bolt.fdata --update-debug-sections
BOLT-INFO: Inserted 119546 stubs in the hot area and 0 stubs in the cold area. Shared 0 times, iterated 3 times.
llvm-bolt: /test/bolt/lib/Core/BinaryFunction.cpp:4224: llvm::bolt::DebugAddressRangesVector llvm::bolt::BinaryFunction::getOutputAddressRanges() const: Assertion `!Frag->isSimple() && "fragment of non-simple function should also be non-simple"' failed.
 #0 0x0000aaaae59894bc llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /test/llvm/lib/Support/Unix/Signals.inc:723:22
 #1 0x0000aaaae598986c PrintStackTraceSignalHandler(void*) /test/llvm/lib/Support/Unix/Signals.inc:798:1
 #2 0x0000aaaae5987054 llvm::sys::RunSignalHandlers() /test/llvm/lib/Support/Signals.cpp:105:20
 #3 0x0000aaaae5988cb4 SignalHandler(int) /test/llvm/lib/Support/Unix/Signals.inc:413:1
 #4 0x0000ffffaef0d7c0 (linux-vdso.so.1+0x7c0)
 #5 0x0000ffffaebc1e80 raise (/lib64/libc.so.6+0x34e80)
 #6 0x0000ffffaebc3374 abort (/lib64/libc.so.6+0x36374)
 #7 0x0000ffffaebbadd4 (/lib64/libc.so.6+0x2ddd4)
 #8 0x0000ffffaebbae5c (/lib64/libc.so.6+0x2de5c)
 #9 0x0000aaaae64e014c llvm::bolt::BinaryFunction::getOutputAddressRanges() const /test/bolt/lib/Core/BinaryFunction.cpp:4224:5
#10 0x0000aaaae6483544 llvm::bolt::BinaryContext::translateModuleAddressRanges(std::vector<llvm::DWARFAddressRange, std::allocator<llvm::DWARFAddressRange>> const&) const /test/bolt/lib/Core/BinaryContext.cpp:2459:66
#11 0x0000aaaae5b1f5b0 llvm::bolt::DWARFRewriter::updateUnitDebugInfo(llvm::DWARFUnit&, llvm::bolt::DIEBuilder&, llvm::bolt::DebugLocWriter&, llvm::bolt::DebugRangesSectionWriter&, std::optional<unsigned long>) /test/bolt/lib/Rewrite/DWARFRewriter.cpp:863:70
#12 0x0000aaaae5b1e03c llvm::bolt::DWARFRewriter::updateDebugInfo()::'lambda1'(llvm::DWARFUnit*, llvm::bolt::DIEBuilder*)::operator()(llvm::DWARFUnit*, llvm::bolt::DIEBuilder*) const /test/bolt/lib/Rewrite/DWARFRewriter.cpp:746:75
#13 0x0000aaaae5b1e6fc llvm::bolt::DWARFRewriter::updateDebugInfo() /test/bolt/lib/Rewrite/DWARFRewriter.cpp:771:53
#14 0x0000aaaae5a64d68 llvm::bolt::RewriteInstance::updateMetadata() /test/bolt/lib/Rewrite/RewriteInstance.cpp:3421:57
#15 0x0000aaaae5a5631c llvm::bolt::RewriteInstance::run() /test/bolt/lib/Rewrite/RewriteInstance.cpp:747:32
#16 0x0000aaaae4726f58 main /test/bolt/tools/driver/llvm-bolt.cpp:243:29
#17 0x0000ffffaebadbec __libc_start_main (/lib64/libc.so.6+0x20bec)
#18 0x0000aaaae4725ecc _start (/test/buildm/bin/llvm-bolt+0x3e8ecc)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: /test/buildm/bin/llvm-bolt gaussdb -o test --data=tpch.arm.bolt.fdata --update-debug-sections
[1]    37301 abort (core dumped)  ~/BiShengKernel/buildm/bin/llvm-bolt gaussdb -o test  --update-debug-sections

My binary is too large to upload :(

aaupov commented 10 months ago

Thank you for a report. How did you build the input binary? It appears that you might be using GCC with function splitting which is causing the issue – please try rebuilding with -fno-reorder-blocks-and-partition.

In the meantime, please share either the test case or the repro for the binary, this will aid in addressing the issue.

llongint commented 10 months ago

I compiled it with clang, and the issue might be due to adding the option -mllvm -enable-split-machine-functions=true. However, I think we should fix this problem ?

llongint commented 10 months ago

I'm trying to reproduce the issue as follows, but currently, I don't know how to mark the function 'func' as 'no-simple'. I'll continue trying when I have time.

// cold.c
__attribute__((noinline))
int func_cold(int num) {
  __asm__ __volatile__(
  "nop\n"
  "nop\n"
  :::);
  return (num - num / 2 + 1) * ( num + num / 7);
}

__attribute__((noinline))
int func(int argc) {
  __asm__ __volatile__(
  "nop\n"
  "nop\n"
  :::);
  if(argc < 4)
    return func_cold(argc +100);
  return 0;
}

int main(int argc, char *argv[]) {
  return func(argc);
}
gcc cold.c -S -g -O0 -Wl,-q -no-pie
sed -i "s#func_cold#func.cold#g" cold.s
gcc cold.s -g -Wl,-q -no-pie -O0
./build/bin/llvm-bolt a.out -o a.inst -instrument -instrumentation-file=a.fdata --instrumentation-wait-forks -instrumentation-sleep-time=2 -instrumentation-no-counters-clear --instrumentation-binpath=a.inst
./a.inst
sed -n "/func/p" -i a.fdata
sed "s/$/000/g" -i a.fdata
./build/bin/llvm-bolt a.out -o a.opt --data=a.fdata --update-debug-sections
aaupov commented 10 months ago

I compiled it with clang, and the issue might be due to adding the option -mllvm -enable-split-machine-functions=true. However, I think we should fix this problem ?

LLVM MachineFunction splitting is equivalent to GCC freorder-blocks-and-partition. BOLT performs function splitting with more precise profile, so we recommend to disable it in the compiler.