llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.14k stars 12.02k forks source link

[BOLT] SIGABRT occured on AArch64 when perf2bolt was executed #63612

Open taisei-iha opened 1 year ago

taisei-iha commented 1 year ago

perf2bolt was abort on AArch64 when perf2bolt was executing for shared libraries.

Question

Can I use optimization with BOLT for shared libraries? If I can use it, would you investigate this problem? Also, Please let me know how to avoid this problem.

Problem

abort messages: Assertion `I->first == Offset && "CFI pointing to unknown instruction"' failed.

This incident occurs on aarch64 but does not occur on x86. the messages are as follows.

$ perf2bolt ~/OpenFOAM/clang-bolt/OpenFOAM-v2212/platforms/linuxARM64ClangDPInt32Opt/lib/libOpenFOAM.so -p perf.data -w profile.yaml -o perf.fdata -nl
BOLT-INFO: shared object or position-independent executable detected
PERF2BOLT: Starting data aggregation job for perf.data
PERF2BOLT: spawning perf job to read events without LBR
PERF2BOLT: spawning perf job to read mem events
PERF2BOLT: spawning perf job to read process events
PERF2BOLT: spawning perf job to read task events
BOLT-INFO: Target architecture: aarch64
BOLT-INFO: BOLT version: ae42196bc493ffe877a7e3dff8be32035dea4d07
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x1000000, offset 0x1000000
BOLT-INFO: enabling relocation mode
BOLT-INFO: disabling -align-macro-fusion on non-x86 platform
BOLT-INFO: enabling strict relocation mode for aggregation purposes
BOLT-INFO: pre-processing profile using perf data aggregator
BOLT-WARNING: build-id will not be checked because we could not read one from input binary
PERF2BOLT: waiting for perf mmap events collection to finish...
PERF2BOLT: parsing perf-script mmap events output
PERF2BOLT: waiting for perf task events collection to finish...
PERF2BOLT: parsing perf-script task events output
PERF2BOLT: input binary is associated with 1 PID(s)
PERF2BOLT: waiting for perf events collection to finish...
PERF2BOLT: parsing basic events (without LBR)...
perf2bolt: /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/bolt/include/bolt/Core/BinaryFunction.h:1662: void llvm::bolt::BinaryFunction::addCFIInstruction(uint64_t, llvm::MCCFIInstruction&&): Assertion `I->first == Offset && "CFI pointing to unknown instruction"' failed.

 #0 0x00000000024fe47c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/llvm/lib/Support/Unix/Signals.inc:567:22
 #1 0x00000000024fe7dc PrintStackTraceSignalHandler(void*) /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/llvm/lib/Support/Unix/Signals.inc:641:1
 #2 0x00000000024fc728 llvm::sys::RunSignalHandlers() /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/llvm/lib/Support/Signals.cpp:104:20
 #3 0x00000000024fde50 SignalHandler(int) /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/llvm/lib/Support/Unix/Signals.inc:412:1
 #4 0x000040000006066c (linux-vdso.so.1+0x66c)
 #5 0x0000400000542c1c raise (/lib64/libc.so.6+0x32c1c)
 #6 0x00004000005307a8 abort (/lib64/libc.so.6+0x207a8)
 #7 0x000040000053c2e8 __assert_fail_base (/lib64/libc.so.6+0x2c2e8)
 #8 0x000040000053c350 __assert_perror_fail (/lib64/libc.so.6+0x2c350)
 #9 0x0000000003e5de98 llvm::bolt::BinaryFunction::addCFIInstruction(unsigned long, llvm::MCCFIInstruction&&) /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/bolt/include/bolt/Core/BinaryFunction.h:1662:5
#10 0x0000000003e5c2c0 llvm::bolt::CFIReaderWriter::fillCFIInfoFor(llvm::bolt::BinaryFunction&) const::'lambda'(llvm::dwarf::CFIProgram::Instruction const&)::operator()(llvm::dwarf::CFIProgram::Instruction const&) const /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/bolt/lib/Core/Exceptions.cpp:576:38
#11 0x0000000003e5ca68 llvm::bolt::CFIReaderWriter::fillCFIInfoFor(llvm::bolt::BinaryFunction&) const /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/bolt/lib/Core/Exceptions.cpp:659:32
#12 0x00000000025a7b18 llvm::bolt::RewriteInstance::disassembleFunctions() /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:3009:62
#13 0x000000000259c1bc llvm::bolt::RewriteInstance::run() /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:751:27
#14 0x000000000040c008 main /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/bolt/tools/driver/llvm-bolt.cpp:244:29
#15 0x0000400000530be4 __libc_start_main (/lib64/libc.so.6+0x20be4)
#16 0x000000000040b07c _start (/home/users/ea01/ea0218/llvm-project-git/16.0.4/build/bin/llvm-bolt+0x40b07c)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: perf2bolt /home/users/ea01/ea0218/OpenFOAM/clang-bolt/OpenFOAM-v2212/platforms/linuxARM64ClangDPInt32Opt/lib/libOpenFOAM.so -p perf.data -w profile.yaml -o perf.fdata -nl
Aborted (core dumped)

Environment

Step to reproduce

This incident occure when it make libopenfoam.so including relocation metadata and executing perf2bolt.

  1. Prepare an environment where OpenFOAM can be built (setting environment variables, executing etc/bashrc, etc.).
  2. "Add "-Wl,--emit-relocs" to include relocation metadata. (In OpenFOAM, set it as follows in src/OpenFOAM/Make/options)"
    c++OPT = -Wl,--emit-relocs
    cOPT = -Wl,--emit-relocs
  3. Run Allwmake for shared library building.
    # cd src
    # ./Allwmake
  4. Run perf for profile collecting. # perf record -e cycles:u numactl -C12 platforms/linuxARM64ClangDPInt32Opt/bin/simpleFoam
  5. Run perf2bolt # perf2bolt platforms/linuxARM64ClangDPInt32Opt/lib/libOpenFOAM.so -p perf.data -w profile.yaml -o perf.fdata -nl

Analysis

The processing of the third assert below is working because I->first and Offset do not match.

void addCFIInstruction(uint64_t Offset, MCCFIInstruction &&Inst) {
    assert(!Instructions.empty());

    // Fix CFI instructions skipping NOPs. We need to fix this because changing
    // CFI state after a NOP, besides being wrong and inaccurate,  makes it
    // harder for us to recover this information, since we can create empty BBs
    // with NOPs and then reorder it away.
    // We fix this by moving the CFI instruction just before any NOPs.
    auto I = Instructions.lower_bound(Offset);
    if (Offset == getSize()) {
      assert(I == Instructions.end() && "unexpected iterator value");
      // Sometimes compiler issues restore_state after all instructions
      // in the function (even after nop).
      --I;
      Offset = I->first;
    }
    assert(I->first == Offset && "CFI pointing to unknown instruction");
    if (I == Instructions.begin()) {
      CIEFrameInstructions.emplace_back(std::forward<MCCFIInstruction>(Inst));
      return;
    }

The variable of I->first and Offset are as follows.

Process 589 resuming
perf2bolt: /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/bolt/include/bolt/Core/BinaryFunction.h:1662: void llvm::bolt::BinaryFunction::addCFIInstruction(uint64_t, llvm::MCCFIInstruction&&): Assertion `I->first == Offset && "CFI pointing to unknown instruction"' failed.
Process 589 stopped
* thread #1, name = 'perf2bolt', stop reason = hit program assert
    frame #4: 0x0000000003e5de98 perf2bolt`llvm::bolt::BinaryFunction::addCFIInstruction(this=0x0000000013177a68, Offset=616, Inst=0x0000ffffffffc520) at BinaryFunction.h:1662:5
   1659       --I;
   1660       Offset = I->first;
   1661     }
-> 1662     assert(I->first == Offset && "CFI pointing to unknown instruction");
   1663     if (I == Instructions.begin()) {
   1664       CIEFrameInstructions.emplace_back(std::forward<MCCFIInstruction>(Inst));
   1665       return;
(lldb) p I->first
(const unsigned int) $14 = 154
(lldb) p Offset
(uint64_t) $15 = 616
(lldb) p I
(std::_Rb_tree_iterator<std::pair<const unsigned int, llvm::MCInst> >) $16 = {
  first = 154
  second = {
    Opcode = 954709168
    Flags = 0
    Loc = (Ptr = "")
    Operands = {
      llvm::SmallVectorImpl<llvm::MCOperand> = {
        llvm::SmallVectorTemplateBase<llvm::MCOperand> = {
          llvm::SmallVectorTemplateCommon<llvm::MCOperand> = {
            llvm::SmallVectorBase<unsigned int> = (BeginX = 0x0000000000000000, Size = 0, Capacity = 0)
          }
        }
      }
      llvm::SmallVectorStorage<llvm::MCOperand, 10> = (InlineElts = "\0\0\0\0\0\0\0\0\xe8}\U00000017\U00000013\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\U00000018~\U00000017\U00000013\0\0\0\0\0\0\0\0\0\0\0\0(~\U00000017\U00000013\0\0\0\0\0\0\0\0\0\0\0\08~\U00000017\U00000013\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xc0\xb3\xe78\0\0\0\0в\xe78\0\0\0\0\xe0\xb9\xe78\0\0\0\0\t\0\0\0\0\0\0\0\xf0\xab\xe78")
    }
  }
}

Reproduction on x86

This incident did not occur on x86. I checked the reproduction with llvmorg-16.0.4.

$ perf2bolt --version
LLVM (http://llvm.org/):
  LLVM version 16.0.4
  Optimized build.
BOLT revision ae42196bc493ffe877a7e3dff8be32035dea4d07

Run perf2bolt with shared library.

$ perf2bolt ~/OpenFOAM/OpenFOAM-v2212/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so -p perf.data -w profile.yaml -o perf.fdata -nl
BOLT-INFO: shared object or position-independent executable detected
PERF2BOLT: Starting data aggregation job for perf.data
PERF2BOLT: spawning perf job to read events without LBR
PERF2BOLT: spawning perf job to read mem events
PERF2BOLT: spawning perf job to read process events
PERF2BOLT: spawning perf job to read task events
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: ae42196bc493ffe877a7e3dff8be32035dea4d07
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x1200000, offset 0x1200000
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling strict relocation mode for aggregation purposes
BOLT-WARNING: split function detected on input : _ZN4FoamL5cwd_PEv.cold/1. The support is limited in relocation modeBOLT-INFO: pre-processing profile using perf data aggregator
BOLT-INFO: binary build-id is:     9e3e4b259c3bee8f25c4368ef82f10101c07a844
PERF2BOLT: spawning perf job to read buildid list
PERF2BOLT: matched build-id and file name
PERF2BOLT: waiting for perf mmap events collection to finish...
PERF2BOLT: parsing perf-script mmap events output
PERF2BOLT: waiting for perf task events collection to finish...
PERF2BOLT: parsing perf-script task events output
PERF2BOLT: input binary is associated with 1 PID(s)
PERF2BOLT: waiting for perf events collection to finish...
PERF2BOLT: parsing basic events (without LBR)...
BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function _ZN4Foam5token5resetEv
PERF2BOLT: processing basic events (without LBR)...
PERF2BOLT: read 124913 samples
PERF2BOLT: out of range samples recorded in unknown regions: 47122 (37.7%)
PERF2BOLT: wrote 4308 objects and 0 memory objects to perf.fdata

Run llvm-bolt to generate an optimized shared library.

$ llvm-bolt ~/OpenFOAM/OpenFOAM-v2212/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so -o ~/OpenFOAM/OpenFOAM-v2212/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM_bolt.so -data=perf.fdata cycles:u
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: ae42196bc493ffe877a7e3dff8be32035dea4d07
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x1200000, offset 0x1200000
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling lite mode
BOLT-WARNING: split function detected on input : _ZN4FoamL5cwd_PEv.cold/1. The support is limited in relocation mode
BOLT-WARNING: disabling lite mode (-lite) when split functions are present
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function _ZN4Foam5token5resetEv
BOLT-INFO: operating with basic samples profiling data (no LBR).
BOLT-INFO: normalizing samples by instruction count.
BOLT-INFO: 893 out of 36683 functions in the binary (2.4%) have non-empty execution profile
BOLT-INFO: 5 functions with profile could not be optimized
BOLT-INFO: the input contains 557 (dynamic count : 67785) opportunities for macro-fusion optimization. Will fix instances on a hot path.
BOLT-INFO: 20047 instructions were shortened
BOLT-INFO: removed 1385 empty blocks
BOLT-INFO: merged 1 duplicate CFG edge
BOLT-INFO: UCE removed 0 blocks and 0 bytes of code.
BOLT-INFO: SCTC: patched 3 tail calls (3 forward) tail calls (0 backward) from a total of 3 while removing 0 double jumps and removing 3 basic blocks totalling 15 bytes of code. CTCs total execution count is 4750 and the number of times CTCs are taken is 0.
BOLT-INFO: patched build-id (flipped last bit)

A section with ".bolt" is created (Sections 14, 16-19, 40 below).

$ readelf -S ~/OpenFOAM/OpenFOAM-v2212/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM_bolt.so
There are 41 section headers, starting at offset 0x2367600:

Section header:
  [No] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .note.gnu.pr[...] NOTE             00000000000002a8  000002a8
       0000000000000020  0000000000000000   A       0     0     8
  [ 2] .note.gnu.bu[...] NOTE             00000000000002c8  000002c8
       0000000000000024  0000000000000000   A       0     0     4
  [ 3] .gnu.hash         GNU_HASH         00000000000002f0  000002f0
       0000000000062550  0000000000000000   A       4     0     8
  [ 4] .dynsym           DYNSYM           0000000000062840  00062840
       0000000000130e18  0000000000000018   A       5     1     8
  [ 5] .dynstr           STRTAB           0000000000193658  00193658
       00000000003aed50  0000000000000000   A       0     0     1
  [ 6] .gnu.version      VERSYM           00000000005423a8  005423a8
       0000000000019682  0000000000000002   A       4     0     2
  [ 7] .gnu.version_r    VERNEED          000000000055ba30  0055ba30
       00000000000001f0  0000000000000000   A       5     4     8
  [ 8] .rela.dyn         RELA             000000000055bc20  0055bc20
       000000000009ff90  0000000000000018   A       4     0     8
  [ 9] .rela.plt         RELA             00000000005fbbb0  005fbbb0
       00000000000a07a0  0000000000000018  AI       4    25     8
  [10] .init             PROGBITS         000000000069d000  0069d000
       000000000000001b  0000000000000000  AX       0     0     4
  [11] .plt              PROGBITS         000000000069d020  0069d020
       000000000006afd0  0000000000000010  AX       0     0     16
  [12] .plt.got          PROGBITS         0000000000707ff0  00707ff0
       00000000000024f0  0000000000000010  AX       0     0     16
  [13] .plt.sec          PROGBITS         000000000070a4e0  0070a4e0
       000000000006afc0  0000000000000010  AX       0     0     16
  [14] .bolt.org.text    PROGBITS         00000000007754a0  007754a0
       00000000005cdfe9  0000000000000000  AX       0     0     16
  [15] .fini             PROGBITS         0000000000d4348c  00d4348c
       000000000000000d  0000000000000000  AX       0     0     4
  [16] .bolt.org.rodata  PROGBITS         0000000000d44000  00d44000
       00000000001656e5  0000000000000000   A       0     0     32
  [17] .bolt.org.eh[...] PROGBITS         0000000000ea96e8  00ea96e8
       0000000000047a44  0000000000000000   A       0     0     4
  [18] .bolt.org.eh[...] PROGBITS         0000000000ef1130  00ef1130
       000000000013019c  0000000000000000   A       0     0     8
  [19] .bolt.org.gc[...] PROGBITS         00000000010212cc  010212cc
       0000000000034c99  0000000000000000   A       0     0     4
  [20] .init_array       INIT_ARRAY       0000000001057cf8  01056cf8
       00000000000012a0  0000000000000008  WA       0     0     8
  [21] .fini_array       FINI_ARRAY       0000000001058f98  01057f98
       0000000000000008  0000000000000008  WA       0     0     8
  [22] .data.rel.ro      PROGBITS         0000000001058fa0  01057fa0
       000000000002b788  0000000000000000  WA       0     0     32
  [23] .dynamic          DYNAMIC          0000000001084728  01083728
       0000000000000210  0000000000000010  WA       5     0     8
  [24] .got              PROGBITS         0000000001084938  01083938
       000000000000c6c8  0000000000000008  WA       0     0     8
  [25] .got.plt          PROGBITS         0000000001091000  01090000
       00000000000357f8  0000000000000008  WA       0     0     8
  [26] .data             PROGBITS         00000000010c6800  010c5800
       00000000000005f8  0000000000000000  WA       0     0     32
  [27] .tm_clone_table   PROGBITS         00000000010c6df8  010c5df8
       0000000000000000  0000000000000000  WA       0     0     8
  [28] .bss              NOBITS           00000000010c6e00  010c5df8
       00000000000145c8  0000000000000000  WA       0     0     32
  [29] .text             PROGBITS         0000000001400000  01400000
       000000000002f4a4  0000000000000000  AX       0     0     2097152
  [30] .text.cold        PROGBITS         000000000142f4c0  0142f4c0
       00000000005e8ad7  0000000000000000  AX       0     0     64
  [31] .eh_frame         PROGBITS         0000000001a17f98  01a17f98
       000000000025f35c  0000000000000000   A       0     0     8
  [32] .gcc_except_table PROGBITS         0000000001c772f4  01c772f4
       0000000000061230  0000000000000000   A       0     0     4
  [33] .rodata           PROGBITS         0000000001cd8524  01cd8524
       0000000000000110  0000000000000000   A       0     0     4
  [34] .rodata.cold      PROGBITS         0000000001cd8634  01cd8634
       0000000000001034  0000000000000000   A       0     0     4
  [35] .eh_frame_hdr     PROGBITS         0000000001cd9668  01cd9668
       000000000008f464  0000000000000000   A       0     0     1
  [36] .comment          PROGBITS         0000000000000000  01d68acc
       0000000000000058  0000000000000001  MS       0     0     1
  [37] .symtab           SYMTAB           0000000000000000  01d68b28
       0000000000240fa8  0000000000000018          38   46439     8
  [38] .strtab           STRTAB           0000000000000000  01fa9ad0
       00000000003bd7d3  0000000000000000           0     0     1
  [39] .shstrtab         STRTAB           0000000000000000  023672a3
       00000000000001e8  0000000000000000           0     0     1
  [40] .note.bolt_info   NOTE             0000000000000000  0236748b
       0000000000000154  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  R (retain), D (mbind), l (large), p (processor specific)
llvmbot commented 1 year ago

@llvm/issue-subscribers-bolt

yota9 commented 1 year ago

Hello! Could you please give pre-built binaries and profile data to reproduce the problem? Thank you!