perf2bolt was abort on AArch64 when perf2bolt was executing for shared libraries.
Question
Can I use optimization with BOLT for shared libraries? If I can use it, would you investigate this problem? Also, Please let me know how to avoid this problem.
The processing of the third assert below is working because I->first and Offset do not match.
void addCFIInstruction(uint64_t Offset, MCCFIInstruction &&Inst) {
assert(!Instructions.empty());
// Fix CFI instructions skipping NOPs. We need to fix this because changing
// CFI state after a NOP, besides being wrong and inaccurate, makes it
// harder for us to recover this information, since we can create empty BBs
// with NOPs and then reorder it away.
// We fix this by moving the CFI instruction just before any NOPs.
auto I = Instructions.lower_bound(Offset);
if (Offset == getSize()) {
assert(I == Instructions.end() && "unexpected iterator value");
// Sometimes compiler issues restore_state after all instructions
// in the function (even after nop).
--I;
Offset = I->first;
}
assert(I->first == Offset && "CFI pointing to unknown instruction");
if (I == Instructions.begin()) {
CIEFrameInstructions.emplace_back(std::forward<MCCFIInstruction>(Inst));
return;
}
The variable of I->first and Offset are as follows.
Process 589 resuming
perf2bolt: /home/users/ea01/ea0218/llvm-project-git/16.0.4/llvm-project/bolt/include/bolt/Core/BinaryFunction.h:1662: void llvm::bolt::BinaryFunction::addCFIInstruction(uint64_t, llvm::MCCFIInstruction&&): Assertion `I->first == Offset && "CFI pointing to unknown instruction"' failed.
Process 589 stopped
* thread #1, name = 'perf2bolt', stop reason = hit program assert
frame #4: 0x0000000003e5de98 perf2bolt`llvm::bolt::BinaryFunction::addCFIInstruction(this=0x0000000013177a68, Offset=616, Inst=0x0000ffffffffc520) at BinaryFunction.h:1662:5
1659 --I;
1660 Offset = I->first;
1661 }
-> 1662 assert(I->first == Offset && "CFI pointing to unknown instruction");
1663 if (I == Instructions.begin()) {
1664 CIEFrameInstructions.emplace_back(std::forward<MCCFIInstruction>(Inst));
1665 return;
(lldb) p I->first
(const unsigned int) $14 = 154
(lldb) p Offset
(uint64_t) $15 = 616
(lldb) p I
(std::_Rb_tree_iterator<std::pair<const unsigned int, llvm::MCInst> >) $16 = {
first = 154
second = {
Opcode = 954709168
Flags = 0
Loc = (Ptr = "")
Operands = {
llvm::SmallVectorImpl<llvm::MCOperand> = {
llvm::SmallVectorTemplateBase<llvm::MCOperand> = {
llvm::SmallVectorTemplateCommon<llvm::MCOperand> = {
llvm::SmallVectorBase<unsigned int> = (BeginX = 0x0000000000000000, Size = 0, Capacity = 0)
}
}
}
llvm::SmallVectorStorage<llvm::MCOperand, 10> = (InlineElts = "\0\0\0\0\0\0\0\0\xe8}\U00000017\U00000013\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\U00000018~\U00000017\U00000013\0\0\0\0\0\0\0\0\0\0\0\0(~\U00000017\U00000013\0\0\0\0\0\0\0\0\0\0\0\08~\U00000017\U00000013\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xc0\xb3\xe78\0\0\0\0в\xe78\0\0\0\0\xe0\xb9\xe78\0\0\0\0\t\0\0\0\0\0\0\0\xf0\xab\xe78")
}
}
}
Reproduction on x86
This incident did not occur on x86.
I checked the reproduction with llvmorg-16.0.4.
$ perf2bolt ~/OpenFOAM/OpenFOAM-v2212/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so -p perf.data -w profile.yaml -o perf.fdata -nl
BOLT-INFO: shared object or position-independent executable detected
PERF2BOLT: Starting data aggregation job for perf.data
PERF2BOLT: spawning perf job to read events without LBR
PERF2BOLT: spawning perf job to read mem events
PERF2BOLT: spawning perf job to read process events
PERF2BOLT: spawning perf job to read task events
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: ae42196bc493ffe877a7e3dff8be32035dea4d07
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x1200000, offset 0x1200000
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling strict relocation mode for aggregation purposes
BOLT-WARNING: split function detected on input : _ZN4FoamL5cwd_PEv.cold/1. The support is limited in relocation modeBOLT-INFO: pre-processing profile using perf data aggregator
BOLT-INFO: binary build-id is: 9e3e4b259c3bee8f25c4368ef82f10101c07a844
PERF2BOLT: spawning perf job to read buildid list
PERF2BOLT: matched build-id and file name
PERF2BOLT: waiting for perf mmap events collection to finish...
PERF2BOLT: parsing perf-script mmap events output
PERF2BOLT: waiting for perf task events collection to finish...
PERF2BOLT: parsing perf-script task events output
PERF2BOLT: input binary is associated with 1 PID(s)
PERF2BOLT: waiting for perf events collection to finish...
PERF2BOLT: parsing basic events (without LBR)...
BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function _ZN4Foam5token5resetEv
PERF2BOLT: processing basic events (without LBR)...
PERF2BOLT: read 124913 samples
PERF2BOLT: out of range samples recorded in unknown regions: 47122 (37.7%)
PERF2BOLT: wrote 4308 objects and 0 memory objects to perf.fdata
Run llvm-bolt to generate an optimized shared library.
$ llvm-bolt ~/OpenFOAM/OpenFOAM-v2212/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so -o ~/OpenFOAM/OpenFOAM-v2212/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM_bolt.so -data=perf.fdata cycles:u
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: ae42196bc493ffe877a7e3dff8be32035dea4d07
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x1200000, offset 0x1200000
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling lite mode
BOLT-WARNING: split function detected on input : _ZN4FoamL5cwd_PEv.cold/1. The support is limited in relocation mode
BOLT-WARNING: disabling lite mode (-lite) when split functions are present
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function _ZN4Foam5token5resetEv
BOLT-INFO: operating with basic samples profiling data (no LBR).
BOLT-INFO: normalizing samples by instruction count.
BOLT-INFO: 893 out of 36683 functions in the binary (2.4%) have non-empty execution profile
BOLT-INFO: 5 functions with profile could not be optimized
BOLT-INFO: the input contains 557 (dynamic count : 67785) opportunities for macro-fusion optimization. Will fix instances on a hot path.
BOLT-INFO: 20047 instructions were shortened
BOLT-INFO: removed 1385 empty blocks
BOLT-INFO: merged 1 duplicate CFG edge
BOLT-INFO: UCE removed 0 blocks and 0 bytes of code.
BOLT-INFO: SCTC: patched 3 tail calls (3 forward) tail calls (0 backward) from a total of 3 while removing 0 double jumps and removing 3 basic blocks totalling 15 bytes of code. CTCs total execution count is 4750 and the number of times CTCs are taken is 0.
BOLT-INFO: patched build-id (flipped last bit)
A section with ".bolt" is created (Sections 14, 16-19, 40 below).
$ readelf -S ~/OpenFOAM/OpenFOAM-v2212/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM_bolt.so
There are 41 section headers, starting at offset 0x2367600:
Section header:
[No] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .note.gnu.pr[...] NOTE 00000000000002a8 000002a8
0000000000000020 0000000000000000 A 0 0 8
[ 2] .note.gnu.bu[...] NOTE 00000000000002c8 000002c8
0000000000000024 0000000000000000 A 0 0 4
[ 3] .gnu.hash GNU_HASH 00000000000002f0 000002f0
0000000000062550 0000000000000000 A 4 0 8
[ 4] .dynsym DYNSYM 0000000000062840 00062840
0000000000130e18 0000000000000018 A 5 1 8
[ 5] .dynstr STRTAB 0000000000193658 00193658
00000000003aed50 0000000000000000 A 0 0 1
[ 6] .gnu.version VERSYM 00000000005423a8 005423a8
0000000000019682 0000000000000002 A 4 0 2
[ 7] .gnu.version_r VERNEED 000000000055ba30 0055ba30
00000000000001f0 0000000000000000 A 5 4 8
[ 8] .rela.dyn RELA 000000000055bc20 0055bc20
000000000009ff90 0000000000000018 A 4 0 8
[ 9] .rela.plt RELA 00000000005fbbb0 005fbbb0
00000000000a07a0 0000000000000018 AI 4 25 8
[10] .init PROGBITS 000000000069d000 0069d000
000000000000001b 0000000000000000 AX 0 0 4
[11] .plt PROGBITS 000000000069d020 0069d020
000000000006afd0 0000000000000010 AX 0 0 16
[12] .plt.got PROGBITS 0000000000707ff0 00707ff0
00000000000024f0 0000000000000010 AX 0 0 16
[13] .plt.sec PROGBITS 000000000070a4e0 0070a4e0
000000000006afc0 0000000000000010 AX 0 0 16
[14] .bolt.org.text PROGBITS 00000000007754a0 007754a0
00000000005cdfe9 0000000000000000 AX 0 0 16
[15] .fini PROGBITS 0000000000d4348c 00d4348c
000000000000000d 0000000000000000 AX 0 0 4
[16] .bolt.org.rodata PROGBITS 0000000000d44000 00d44000
00000000001656e5 0000000000000000 A 0 0 32
[17] .bolt.org.eh[...] PROGBITS 0000000000ea96e8 00ea96e8
0000000000047a44 0000000000000000 A 0 0 4
[18] .bolt.org.eh[...] PROGBITS 0000000000ef1130 00ef1130
000000000013019c 0000000000000000 A 0 0 8
[19] .bolt.org.gc[...] PROGBITS 00000000010212cc 010212cc
0000000000034c99 0000000000000000 A 0 0 4
[20] .init_array INIT_ARRAY 0000000001057cf8 01056cf8
00000000000012a0 0000000000000008 WA 0 0 8
[21] .fini_array FINI_ARRAY 0000000001058f98 01057f98
0000000000000008 0000000000000008 WA 0 0 8
[22] .data.rel.ro PROGBITS 0000000001058fa0 01057fa0
000000000002b788 0000000000000000 WA 0 0 32
[23] .dynamic DYNAMIC 0000000001084728 01083728
0000000000000210 0000000000000010 WA 5 0 8
[24] .got PROGBITS 0000000001084938 01083938
000000000000c6c8 0000000000000008 WA 0 0 8
[25] .got.plt PROGBITS 0000000001091000 01090000
00000000000357f8 0000000000000008 WA 0 0 8
[26] .data PROGBITS 00000000010c6800 010c5800
00000000000005f8 0000000000000000 WA 0 0 32
[27] .tm_clone_table PROGBITS 00000000010c6df8 010c5df8
0000000000000000 0000000000000000 WA 0 0 8
[28] .bss NOBITS 00000000010c6e00 010c5df8
00000000000145c8 0000000000000000 WA 0 0 32
[29] .text PROGBITS 0000000001400000 01400000
000000000002f4a4 0000000000000000 AX 0 0 2097152
[30] .text.cold PROGBITS 000000000142f4c0 0142f4c0
00000000005e8ad7 0000000000000000 AX 0 0 64
[31] .eh_frame PROGBITS 0000000001a17f98 01a17f98
000000000025f35c 0000000000000000 A 0 0 8
[32] .gcc_except_table PROGBITS 0000000001c772f4 01c772f4
0000000000061230 0000000000000000 A 0 0 4
[33] .rodata PROGBITS 0000000001cd8524 01cd8524
0000000000000110 0000000000000000 A 0 0 4
[34] .rodata.cold PROGBITS 0000000001cd8634 01cd8634
0000000000001034 0000000000000000 A 0 0 4
[35] .eh_frame_hdr PROGBITS 0000000001cd9668 01cd9668
000000000008f464 0000000000000000 A 0 0 1
[36] .comment PROGBITS 0000000000000000 01d68acc
0000000000000058 0000000000000001 MS 0 0 1
[37] .symtab SYMTAB 0000000000000000 01d68b28
0000000000240fa8 0000000000000018 38 46439 8
[38] .strtab STRTAB 0000000000000000 01fa9ad0
00000000003bd7d3 0000000000000000 0 0 1
[39] .shstrtab STRTAB 0000000000000000 023672a3
00000000000001e8 0000000000000000 0 0 1
[40] .note.bolt_info NOTE 0000000000000000 0236748b
0000000000000154 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
R (retain), D (mbind), l (large), p (processor specific)
perf2bolt was abort on AArch64 when perf2bolt was executing for shared libraries.
Question
Can I use optimization with BOLT for shared libraries? If I can use it, would you investigate this problem? Also, Please let me know how to avoid this problem.
Problem
abort messages: Assertion `I->first == Offset && "CFI pointing to unknown instruction"' failed.
This incident occurs on aarch64 but does not occur on x86. the messages are as follows.
Environment
LLVM:llvmorg-16.0.4
The options when making perf2bolt:
Step to reproduce
This incident occure when it make libopenfoam.so including relocation metadata and executing perf2bolt.
# perf record -e cycles:u numactl -C12 platforms/linuxARM64ClangDPInt32Opt/bin/simpleFoam
# perf2bolt platforms/linuxARM64ClangDPInt32Opt/lib/libOpenFOAM.so -p perf.data -w profile.yaml -o perf.fdata -nl
Analysis
The processing of the third assert below is working because I->first and Offset do not match.
The variable of I->first and Offset are as follows.
Reproduction on x86
This incident did not occur on x86. I checked the reproduction with llvmorg-16.0.4.
Run perf2bolt with shared library.
Run llvm-bolt to generate an optimized shared library.
A section with ".bolt" is created (Sections 14, 16-19, 40 below).