Open vient opened 2 months ago
@MaskRay you have recent commits in evaluateAsRelocatable - may you have an idea what changes in LLVM 19 can cause such regression?
Top functions machine code part became a lot slower in LLVM 19, there are no MC functions near the top in LLVM 18.
Don't know why perf does not show inlined functions, here are hottest instructions of first three functions
llvm::ELFObjectWriter::isSymbolRefDifferenceFullyResolvedImpl(llvm::MCAssembler const&, llvm::MCSymbol const&, llvm::MCFragment const&, bool, bool) const at llvm/lib/MC/ELFObjectWriter.cpp:1447:29
(inlined by) llvm::MCObjectWriter::isSymbolRefDifferenceFullyResolved(llvm::MCAssembler const&, llvm::MCSymbolRefExpr const*, llvm::MCSymbolRefExpr const*, bool) const at llvm/lib/MC/MCObjectWriter.cpp:45:10
(inlined by) AttemptToFoldSymbolOffsetDifference(llvm::MCAssembler const*, llvm::DenseMap<llvm::MCSection const*, unsigned long, llvm::DenseMapInfo<llvm::MCSection const*, void>, llvm::detail::DenseMapPair<llvm::MCSection const*, unsigned long>> const*, bool, llvm::MCSymbolRefExpr const*&, llvm::MCSymbolRefExpr const*&, long&) at llvm/lib/MC/MCExpr.cpp:601:25
(inlined by) evaluateSymbolicAdd(llvm::MCAssembler const*, llvm::DenseMap<llvm::MCSection const*, unsigned long, llvm::DenseMapInfo<llvm::MCSection const*, void>, llvm::detail::DenseMapPair<llvm::MCSection const*, unsigned long>> const*, bool, llvm::MCValue const&, llvm::MCValue const&, llvm::MCValue&) at llvm/lib/MC/MCExpr.cpp:768:5
(inlined by) llvm::MCExpr::evaluateAsRelocatableImpl(llvm::MCValue&, llvm::MCAssembler const*, llvm::MCFixup const*, llvm::DenseMap<llvm::MCSection const*, unsigned long, llvm::DenseMapInfo<llvm::MCSection const*, void>, llvm::detail::DenseMapPair<llvm::MCSection const*, unsigned long>> const*, bool) const at llvm/lib/MC/MCExpr.cpp:950:16
llvm::MCExpr::evaluateAsRelocatableImpl(llvm::MCValue&, llvm::MCAssembler const*, llvm::MCFixup const*, llvm::DenseMap<llvm::MCSection const*, unsigned long, llvm::DenseMapInfo<llvm::MCSection const*, void>, llvm::detail::DenseMapPair<llvm::MCSection const*, unsigned long>> const*, bool) const at llvm/lib/MC/MCExpr.cpp:819:3
evaluateSymbolicAdd(llvm::MCAssembler const*, llvm::DenseMap<llvm::MCSection const*, unsigned long, llvm::DenseMapInfo<llvm::MCSection const*, void>, llvm::detail::DenseMapPair<llvm::MCSection const*, unsigned long>> const*, bool, llvm::MCValue const&, llvm::MCValue const&, llvm::MCValue&) at llvm/lib/MC/MCExpr.cpp:755:7
(inlined by) llvm::MCExpr::evaluateAsRelocatableImpl(llvm::MCValue&, llvm::MCAssembler const*, llvm::MCFixup const*, llvm::DenseMap<llvm::MCSection const*, unsigned long, llvm::DenseMapInfo<llvm::MCSection const*, void>, llvm::detail::DenseMapPair<llvm::MCSection const*, unsigned long>> const*, bool) const at llvm/lib/MC/MCExpr.cpp:950:16
llvm::MCAssembler::relaxFragment(llvm::MCFragment&) at llvm/lib/MC/MCAssembler.cpp:1285:3
(inlined by) llvm::MCAssembler::layoutOnce() at llvm/lib/MC/MCAssembler.cpp:1315:11
(inlined by) llvm::MCAssembler::layout() at llvm/lib/MC/MCAssembler.cpp:941:10
llvm::MCAssembler::relaxBoundaryAlign(llvm::MCBoundaryAlignFragment&) at llvm/lib/MC/MCAssembler.cpp:1189:8
(inlined by) llvm::MCAssembler::relaxFragment(llvm::MCFragment&) at llvm/lib/MC/MCAssembler.cpp:1299:12
(inlined by) llvm::MCAssembler::layoutOnce() at llvm/lib/MC/MCAssembler.cpp:1315:11
(inlined by) llvm::MCAssembler::layout() at llvm/lib/MC/MCAssembler.cpp:941:10
llvm::SmallVectorBase<unsigned long>::size() const at llvm/include/llvm/ADT/SmallVector.h:92:32
(inlined by) llvm::MCAssembler::computeFragmentSize(llvm::MCFragment const&) const at llvm/lib/MC/MCAssembler.cpp:0:0
(inlined by) llvm::MCAssembler::relaxBoundaryAlign(llvm::MCBoundaryAlignFragment&) at llvm/lib/MC/MCAssembler.cpp:1195:20
(inlined by) llvm::MCAssembler::relaxFragment(llvm::MCFragment&) at llvm/lib/MC/MCAssembler.cpp:1299:12
(inlined by) llvm::MCAssembler::layoutOnce() at llvm/lib/MC/MCAssembler.cpp:1315:11
(inlined by) llvm::MCAssembler::layout() at llvm/lib/MC/MCAssembler.cpp:941:10
llvm::MCAssembler::computeFragmentSize(llvm::MCFragment const&) const at llvm/lib/MC/MCAssembler.cpp:251:3
(inlined by) llvm::MCAssembler::ensureValid(llvm::MCSection&) const at llvm/lib/MC/MCAssembler.cpp:447:15
llvm::MCAssembler::isBundlingEnabled() const at llvm/include/llvm/MC/MCAssembler.h:208:59
(inlined by) llvm::MCAssembler::ensureValid(llvm::MCSection&) const at llvm/lib/MC/MCAssembler.cpp:443:9
llvm::MCBoundaryAlignFragment::getSize() const at llvm/include/llvm/MC/MCFragment.h:580:37
(inlined by) llvm::MCAssembler::computeFragmentSize(llvm::MCFragment const&) const at llvm/lib/MC/MCAssembler.cpp:281:45
(inlined by) llvm::MCAssembler::ensureValid(llvm::MCSection&) const at llvm/lib/MC/MCAssembler.cpp:447:15
Don't know how I missed this post https://maskray.me/blog/2024-06-30-integrated-assembler-improvements-in-llvm-19 @aengelke do you know if this slowdown is expected? I get from the post that mentioned code parts are supposed to become faster in LLVM 19?
Which architecture? Is this NaCl? (NaCl regressions might be caused by #94950, where I removed MCCompactEncodedInstFragment.) Other than NaCl, this looks like a regression. MaskRay was working on layouting.
x86_64, not NaCl. I think I'm onto something - difference went away when I removed these options
-Wall
-Wextra
-Werror
-pedantic
-Wold-style-cast
-fvisibility=hidden
-fvisibility-inlines-hidden
-Wconversion
-Wsign-conversion
-Wunreachable-code
-Wno-missing-braces
-Wframe-larger-than=2500000
-ffile-prefix-map=/home/rlozko/git/twix=.
-fveclib=libmvec
-fdiagnostics-absolute-paths
-Wno-error=deprecated-declarations
-mbranches-within-32B-boundaries
-Wno-gnu-zero-variadic-macro-arguments
-Wno-enum-constexpr-conversion
-Wno-deprecated-declarations
-fcolor-diagnostics
I'll post later what options exactly affect this - the process is slow, each run takes 20-40 minutes :)
Got it, slowdown goes away when -mbranches-within-32B-boundaries
is removed - in my case it speeds up linkage more than 2 times. Can't find any recent commits related to this flag, sounds directly related to code layout.
Thanks for investigating! This makes some sense, with this option, every instruction gets a new, separate fragment, so that relaxations can be applied later. The code path isn't optimized, as the option is rarely used. Not sure what's causing the regression compared to LLVM 18, though.
Don't know how I missed this post maskray.me/blog/2024-06-30-integrated-assembler-improvements-in-llvm-19
The way we relax MCFragment
s might be related. It's possible that uncommon configurations like -mbranches-within-32B-boundaries
are regressed while normal code paths get faster. Complex expression evaluation, primarily used by the Linux kernel, imposes relaxation schemes we could apply (#100283). I believe it's challenging to ensure that every use case is fast. The current way that optimizes the normal code path and penalizes uncommon -mbranches-within-32B-boundaries
is likely favorable.
We use this option because some of our hosts are Skylake-based, and some workloads are affected by JCC erratum - don't know why the others workloads are not. For a workaround, I've put -mbranches-within-32B-boundaries
under if(ARCH MATCHES "^(skylake|cascadelake)")
. It occurred that, strangely, the same workloads that benefit from this option on Skylake (~5% improvement) are negatively affected by it on other platforms (~2% slowdown).
Overall, can't say that this issue affects us in a serious way. If I understand right that this issue gets a WONTFIX by you, it can be closed.
I'm building the same code with clang 18 and 19, and noticed that some target build times are disproportionately affected by switching to new compiler - in general Clang 19 is 5-10% slower but an LTO build of one particular target slowed down x2.5
Tried
--time-trace
but don't know what to make of it other than that OptModule got some long tails in Clang 19. First worker under main thread is building the same module in both images so can be directly compared - OptModule time increased from 1m20s to 5m24s, x4perf
trace and manual breaking in gdb show that a lot of time is spent aroundand also
llvm::MCExpr::evaluateAsRelocatableImpl
. My current build is stripped though, I'll return back with trace results with debug symbols later.