llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.13k stars 12.01k forks source link

[BOLT] About optionally emitting the .bolt.org.text segment #85796

Open YuanSha0 opened 8 months ago

YuanSha0 commented 8 months ago

I have completed some work on code size optimization, but I noticed that BOLT emits a segment named .bolt.org.text, which accounts for nearly half of the code size in the executable. This renders my optimizations ineffective since the optimized executable is even larger than the unoptimized binary. In this issue, I found that it is not straightforward to remove .bolt.org.text using bolt flags. I would like to know what optimizations are associated with it? Can I sacrifice some (or all) performance optimizations to remove this segment? Will this help in achieving actual code size reduction in BOLT's optimization work? Thank you.

YuanSha0 commented 8 months ago

Could someone provide information on how .bolt.org.text is generated? Perhaps I could develop a codesize optimization option for Bolt (similar to clang -Os) to support current and future codesize optimizations for everyone.

yota9 commented 8 months ago

Hi. .bolt.org.text is original text section that is not touched by the bolt and currently could not be stripped. There is an old patch could be found on fabricator that eliminates it but for not it is considered to be too complicated and not stable enough to be in BOLTs trunk.

YuanSha0 commented 8 months ago

@yota9
Thank you for the information you provided, but I still have some questions. I noticed that there is a newly generated .text section in the binary file. Does this mean Bolt retains both the pre-optimized and post-optimized text sections simultaneously? If so, why can't the original text section be discarded? Even if Bolt is used without any optimization, will both .bolt.org.text and .text coexist? Does this mean that Bolt currently does not have the possibility of code size optimization? (Because it always results in a longer text section than the original one)

yota9 commented 8 months ago

@killerloura Yes, usually both texts are preserved. As for size optimisation there are 2 possibilities:

  1. -use-old-text option that would try to feat the new text to the old section. Most of the time it won't work however based on my experience + you need to play around with extra options removing the code alignment.
  2. -lite mode, it would only create new optimised functions in new text section using the profile, non optimised functions are stayed in old text section.

Removing the old text is "extra work" for BOLT which is not implemented. Plus it is considered "less secure", as we might unintentionally skip some pointers/jump tables/etc and jump on the old text address at runtime (also I've seen such a behaviour maybe only once and fixed the bug, but still).

YuanSha0 commented 8 months ago

@yota9 Thank you again for your response. I intend to conduct further research on this issue.

sdt16 commented 7 months ago

@yota9

currently could not be stripped.

Hi, do you mean this section is unsafe to strip, even manually? Could the issue with stripping it be related to #56738? Ack on the concern it's less safe, but what if we accept that risk?

I'm running into this issue causing the BOLTed binary to not fit in the fixed memory space allocated for the executable, and unfortunately lite mode isn't implemented for aarch64, so I'm a bit stuck.

yota9 commented 7 months ago

Hi, do you mean this section is unsafe to strip, even manually? Could the issue with stripping it be related to https://github.com/llvm/llvm-project/issues/56738? Ack on the concern it's less safe, but what if we accept that risk?

The problem with strip is the new location of PHDR AFAIR. There is a workaround option -use-gnu-stack. But I've never tried io combine it with stripping main text, you may try it.

As for lite and aarch64 - I'm not a user of lite mode, but I'm surprised to hear that. What problems do you encourage with it?

As for "official" method - I believe there is no current way to reject old text section despite the methods I've already mention that could be tried..

YuanSha0 commented 7 months ago

@yota9 Hi, if I enable the -use-old-text option, it aligns the code at a 2MB boundary. Although my optimizations significantly reduce the codesize, during this process it generates many new functions, which causes its size to always be greater than the old size when calculating. My question is whether reducing the alignment size (even down to 4 bytes) would also lead to instability issues?

yota9 commented 7 months ago

@YuanSha0 Hi! I would not expect instability, everything should work just fine. By default BOLT uses large alignment for huge pages support, but I think for use-old-text we shall disable this by default.

sdt16 commented 7 months ago

@yota9 lite mode with aarch64:

llvm-bolt input.elf -o output.elf -data=perf.fdata -data2=perf2.fdata -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions=1 -split-all-cold -split-eh -dyno-stats --use-old-text --lite=1 --no-huge-pages
BOLT-INFO: Target architecture: aarch64
BOLT-INFO: BOLT version: a9356a515b5a1a3637eaf5820fc0d2c0dad21a64
BOLT-INFO: first alloc address is 0x4000000
BOLT-INFO: creating new program header table at address 0x4830000, offset 0x830000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-INFO: static input executable detected
BOLT-INFO: disabling -align-macro-fusion on non-x86 platform
BOLT-WARNING: cannot combine -lite with -use-old-text. Disabling -use-old-text.
BOLT-INFO: enabling lite mode
BOLT-INFO: pre-processing profile using branch profile reader
not implemented
UNREACHABLE executed at /home/bolt/llvm-project/bolt/include/bolt/Core/MCPlusBuilder.h:1609!
 #0 0x0000559fb8740c1f llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/bin/llvm-bolt+0xd66c1f)
 #1 0x0000559fb873e634 SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fc5ec58c3c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x153c0)
 #3 0x00007fc5ec07918b raise (/lib/x86_64-linux-gnu/libc.so.6+0x4618b)
 #4 0x00007fc5ec058859 abort (/lib/x86_64-linux-gnu/libc.so.6+0x25859)
 #5 0x0000559fb86ccfae (/usr/local/bin/llvm-bolt+0xcf2fae)
 #6 0x0000559fb97c22b2 (/usr/local/bin/llvm-bolt+0x1de82b2)
 #7 0x0000559fb9831e03 llvm::bolt::BinaryFunction::scanExternalRefs() (/usr/local/bin/llvm-bolt+0x1e57e03)
 #8 0x0000559fb87aef07 llvm::bolt::RewriteInstance::disassembleFunctions() (/usr/local/bin/llvm-bolt+0xdd4f07)
 #9 0x0000559fb87f67f1 llvm::bolt::RewriteInstance::run() (/usr/local/bin/llvm-bolt+0xe1c7f1)
#10 0x0000559fb7d075ce main (/usr/local/bin/llvm-bolt+0x32d5ce)
#11 0x00007fc5ec05a0b3 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b3)
#12 0x0000559fb7d8a14e _start (/usr/local/bin/llvm-bolt+0x3b014e)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.

I can submit a separate issue if you think it'd be valuable.

yota9 commented 7 months ago

@sdt16 What version of BOLT are you using? Could you please tell me what function is located at /home/bolt/llvm-project/bolt/include/bolt/Core/MCPlusBuilder.h:1609 ? Also JFYI using both --use-old-text --lite=1 doesn't make sense , it would result in using lite=1 only.

YuanSha0 commented 7 months ago

@yota9 I have found a stable method to eliminate the .bolt.org.text section, but I noticed that the CodeSize still suffers negative optimization. The reason for this is that Bolt retains two .eh_frame sections simultaneously. Currently, I haven't found a pass or option to eliminate them. I would like to ask you why they are kept simultaneously and the possibility of eliminating them. Thank you!

image

yota9 commented 7 months ago

@YuanSha0 I suspect the reason is the same as with .org.text - BOLT just doesn't support removing old sections currently, so you would need to do the same stuff as with .org.text you already did. I might be wrong though, if so Meta guys are welcomed to answer this question, I'm not in to DWARF thematics deeply to be honest :)

YuanSha0 commented 7 months ago

@yota9 I processed the old .eh_frame section in the same way as handling the old .text section with the -use-old-text option. Luckily, the program appears to be functioning properly. Thank you!

YuanSha0 commented 7 months ago

@yota9 I have completed a substantial portion of my CODESIZE optimization work, but I have noticed that BOLT seems to offer particularly limited support for jump tables. In the analyzeMemoryAt function, I have found that BOLT does not currently support analysis of jump tables within the .TEXT section nor does it provide support for jump tables in the aarch64 architecture. Is this an aspect of BOLT's current functionality, or does such support exist outside the main branch?

image

yota9 commented 7 months ago

@YuanSha0 AFAICT The aarch64 jump table support is currently very limited. I've tried to study this question a few years ago and AFAIR the main problem is that we can't tell jump table size since there are no markers of it in the binary.

YuanSha0 commented 7 months ago

@yota9 Even if I cannot perform a comprehensive analysis of the jump table, is it possible to obtain partial information about it? Specifically, could we identify which function possesses the jump table and label all instructions belonging to this particular jump table? Disabling the handling of jump table instructions during optimization could be a temporarily viable solution.

yota9 commented 7 months ago

@YuanSha0 Theoretically yes, although I'm not sure how well it is implemented now. You can see BFs methods like hasJumpTables for example and iterate over the instructions with getJumpTable, since they're marked with annotations.. Hope it would help, since I didn't look at this thematics for a long time :)

YuanSha0 commented 7 months ago

@yota9 Thank you again!

YuanSha0 commented 3 months ago

@yota9 Hello, I have a question. I’ve noticed that Bolt now supports Linux kernel optimization. After optimizing the Linux kernel, will it still retain the original text segment? Thank you!

yota9 commented 3 months ago

Hello @YuanSha0 . Honestly I don't know, you have to ask @maksfb

YuanSha0 commented 3 months ago

@yota9 Thank you.

YuanSha0 commented 3 months ago

Hello, I have a question. I’ve noticed that Bolt now supports Linux kernel optimization. After optimizing the Linux kernel, will it still retain the original text segment? Thank you! @maksfb

maksfb commented 3 months ago

Hello, I have a question. I’ve noticed that Bolt now supports Linux kernel optimization. After optimizing the Linux kernel, will it still retain the original text segment? Thank you! @maksfb

Yes, as of right now, the original .text section and the containing segment will be used for optimization.

Emegua commented 2 months ago

@YuanSha0 are you working on a publicly available branch? Would love to collaborate on code-size optimization.

YuanSha0 commented 2 months ago

@YuanSha0 are you working on a publicly available branch? Would love to collaborate on code-size optimization. Hi @Emegua,

Apologies for the delayed response. I've been caught up with some other work. Currently, the project I'm working on is in a private phase and not publicly available, so I'm unable to collaborate on it at this time. However, I appreciate your interest and will keep you in mind for potential future collaborations once the project is at a stage where it can be shared.

Thanks for understanding!