llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.03k stars 11.58k forks source link

PGO clang crashes with BOLT instrumentation #55004

Closed nathanchance closed 1 year ago

nathanchance commented 2 years ago

I am attempting to wire up BOLT support into our toolchain build script. However, when clang is compiled with profile guided optimization, it crashes after it has been instrumented with BOLT. I noticed this when building scripts/dtc/srcpos.c in the Linux kernel, which I reduced below. I see #53994 but the crash is different so I figured I would report it and let someone else mark it as a duplicate.

I was able to reproduce at bff8356b1969d2edd02e22c73d1c3d386f862937 with the following steps on two different x86_64 machines. If assertions are enabled (-DLLVM_ENABLE_ASSERTIONS=ON on all stages), there is no crash.

  1. Compile stage 1
$ cmake \
-B build/stage1 \
-G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=/usr/bin/clang \
-DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
-DLLVM_ENABLE_PROJECTS="bolt;clang;compiler-rt;lld" \
-DLLVM_ENABLE_TERMINFO=OFF \
-DLLVM_TARGETS_TO_BUILD=host \
-DLLVM_USE_LINKER=/usr/bin/ld.lld \
-S llvm

$ ninja -C build/stage1
  1. Compile stage 2
$ cmake \
-B build/stage2 \
-G Ninja \
-DCLANG_TABLEGEN=$PWD/build/stage1/bin/clang-tblgen \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=$PWD/build/stage1/bin/clang \
-DCMAKE_CXX_COMPILER=$PWD/build/stage1/bin/clang++ \
-DLLVM_BUILD_INSTRUMENTED=IR \
-DLLVM_BUILD_RUNTIME=OFF \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_TERMINFO=OFF \
-DLLVM_LINK_LLVM_DYLIB=ON \
-DLLVM_TABLEGEN=$PWD/build/stage1/bin/llvm-tblgen \
-DLLVM_TARGETS_TO_BUILD=host \
-DLLVM_USE_LINKER=$PWD/build/stage1/bin/ld.lld \
-DLLVM_VP_COUNTERS_PER_SITE=6 \
-S llvm

$ ninja -C build/stage2
  1. Run stage 2 against LLVM to generate profiles
$ cmake \
-B build/pgo \
-G Ninja \
-DCLANG_TABLEGEN=$PWD/build/stage1/bin/clang-tblgen \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=$PWD/build/stage2/bin/clang \
-DCMAKE_CXX_COMPILER=$PWD/build/stage2/bin/clang++ \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_TERMINFO=OFF \
-DLLVM_TABLEGEN=$PWD/build/stage1/bin/llvm-tblgen \
-DLLVM_TARGETS_TO_BUILD=host \
-DLLVM_USE_LINKER=$PWD/build/stage2/bin/ld.lld \
-S llvm

$ ninja -C build/pgo check-{clang,lld,llvm,llvm-unit}
  1. Build .prof file from raw profiles
$ build/stage1/bin/llvm-profdata merge \
-output=$PWD/build/profdata.prof \
build/stage2/profiles/*.profraw
  1. Build final compiler (stage 3)
$ cmake \
-B build/stage3 \
-G Ninja \
-DCLANG_TABLEGEN=$PWD/build/stage1/bin/clang-tblgen \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=$PWD/build/stage1/bin/clang \
-DCMAKE_CXX_COMPILER=$PWD/build/stage1/bin/clang++ \
-DCMAKE_EXE_LINKER_FLAGS=-Wl,-q \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_TERMINFO=OFF \
-DLLVM_PROFDATA_FILE=$PWD/build/profdata.prof \
-DLLVM_TABLEGEN=$PWD/build/stage1/bin/llvm-tblgen \
-DLLVM_TARGETS_TO_BUILD=host \
-DLLVM_USE_LINKER=$PWD/build/stage1/bin/ld.lld \
-S llvm

$ ninja -C build/stage3
  1. Instrument clang binary with llvm-bolt
$ build/stage1/bin/llvm-bolt \
--instrument \
--instrumentation-file=$PWD/build/clang.fdata \
-o build/stage3/bin/clang.inst \
build/stage3/bin/clang-15
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: bff8356b1969d2edd02e22c73d1c3d386f862937
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x5400000, offset 0x5400000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-WARNING: Failed to analyze 1194 relocations
BOLT-WARNING: 2 collisions detected while hashing binary objects. Use -v=1 to see the list.
BOLT-INSTRUMENTER: Number of indirect call site descriptors: 39000
BOLT-INSTRUMENTER: Number of indirect call target descriptors: 135258
BOLT-INSTRUMENTER: Number of function descriptors: 135258
BOLT-INSTRUMENTER: Number of branch counters: 1270794
BOLT-INSTRUMENTER: Number of ST leaf node counters: 641245
BOLT-INSTRUMENTER: Number of direct call counters: 0
BOLT-INSTRUMENTER: Total number of counters: 1912039
BOLT-INSTRUMENTER: Total size of counters: 15296312 bytes (static alloc memory)
BOLT-INSTRUMENTER: Total size of string table emitted: 17096121 bytes in file
BOLT-INSTRUMENTER: Total size of descriptors: 136151340 bytes in file
BOLT-INSTRUMENTER: Profile will be saved to file /home/nathan/cbl/src/llvm-project/build/clang.fdata
BOLT-INFO: 0 out of 136234 functions in the binary (0.0%) have non-empty execution profile
BOLT-INFO: the input contains 16376 (dynamic count : 0) opportunities for macro-fusion optimization that are going to be fixed
BOLT-INFO: 4660727 instructions were shortened
BOLT-INFO: removed 126 empty blocks
BOLT-INFO: UCE removed 1077 blocks and 73981 bytes of code.
BOLT-INFO: SCTC: patched 0 tail calls (0 forward) tail calls (0 backward) from a total of 0 while removing 0 double jumps and removing 0 basic blocks totalling 0 bytes of code. CTCs total execution count is 0 and the number of times CTCs are taken is 0.
BOLT-INFO: output linked against instrumentation runtime library, lib entry point is 0xe84aaf0
BOLT-INFO: clear procedure is 0xe8463e0
  1. Compile test file. The BOLT instrumented clang will crash but the original will not.

srcpos.i:

# 1 "" 3
int initial_pathlen_i, initial_pathlen_j;
initial_pathlen() {
  int diff = initial_pathlen;
  char *res = initial_pathlen_i = 0;
  for (; initial_pathlen_i != diff; initial_pathlen_i++)
    res[initial_pathlen_j++] = res[initial_pathlen_j++] = '/';
}
$ build/stage3/bin/clang-15 -O2 -fomit-frame-pointer -std=gnu89 -c -o /dev/null srcpos.i

$ build/stage3/bin/clang.inst -O2 -fomit-frame-pointer -std=gnu89 -c -o /dev/null srcpos.i
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: build/stage3/bin/clang.inst -O2 -fomit-frame-pointer -std=gnu89 -c -o /dev/null srcpos.i
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x0000560576935564 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build/stage3/bin/clang.inst+0x9535564)
 #1 0x0000560576934f01 llvm::sys::CleanupOnSignal(unsigned long) (build/stage3/bin/clang.inst+0x9534f01)
 #2 0x0000560576895f22 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) CrashRecoveryContext.cpp:0:0
 #3 0x000056057689624c CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #4 0x00007f7246997560 __restore_rt libc_sigaction.c:0:0
 #5 0x00005605747036e7 unsigned int std::__find_if<llvm::CostTblEntryT<unsigned int> const*, __gnu_cxx::__ops::_Iter_pred<llvm::CostTblEntryT<unsigned int> const* llvm::CostTableLookup<unsigned int>(llvm::ArrayRef<llvm::CostTblEntryT<unsigned int> >, int, llvm::MVT)::'lambda'(llvm::CostTblEntryT<unsigned int> const&)> >(unsigned int, unsigned int, __gnu_cxx::__ops::_Iter_pred<llvm::CostTblEntryT<unsigned int> const* llvm::CostTableLookup<unsigned int>(llvm::ArrayRef<llvm::CostTblEntryT<unsigned int> >, int, llvm::MVT)::'lambda'(llvm::CostTblEntryT<unsigned int> const&)>, std::random_access_iterator_tag) X86TargetTransformInfo.cpp:0:0
 #6 0x00005605756cb6ac llvm::X86TTIImpl::getInterleavedMemoryOpCost(unsigned int, llvm::Type*, unsigned int, llvm::ArrayRef<unsigned int>, llvm::Align, unsigned int, llvm::TargetTransformInfo::TargetCostKind, bool, bool) X86TargetTransformInfo.cpp:0:0
 #7 0x0000560574435595 llvm::LoopVectorizationCostModel::getInterleaveGroupCost(llvm::Instruction*, llvm::ElementCount) (build/stage3/bin/clang.inst+0x7035595)
 #8 0x0000560574424bda llvm::LoopVectorizationCostModel::setCostBasedWideningDecision(llvm::ElementCount) (build/stage3/bin/clang.inst+0x7024bda)
 #9 0x00005605744233a2 llvm::LoopVectorizationCostModel::collectUniformsAndScalars(llvm::ElementCount) LoopVectorize.cpp:0:0
#10 0x00005605743b61ee llvm::LoopVectorizationPlanner::plan(llvm::ElementCount, unsigned int) (build/stage3/bin/clang.inst+0x6fb61ee)
#11 0x0000560574394b31 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (build/stage3/bin/clang.inst+0x6f94b31)
#12 0x0000560574393b5d llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (build/stage3/bin/clang.inst+0x6f93b5d)
#13 0x00005605743933f4 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build/stage3/bin/clang.inst+0x6f933f4)
#14 0x00005605743930cd llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#15 0x0000560573161327 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build/stage3/bin/clang.inst+0x5d61327)
#16 0x0000560573160e47 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) BackendUtil.cpp:0:0
#17 0x00005605753b6d25 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build/stage3/bin/clang.inst+0x7fb6d25)
#18 0x00005605753b6937 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) BackendUtil.cpp:0:0
#19 0x00005605750a924f llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build/stage3/bin/clang.inst+0x7ca924f)
#20 0x00005605750a05c1 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile> >&) BackendUtil.cpp:0:0
#21 0x000056057508f6a2 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (build/stage3/bin/clang.inst+0x7c8f6a2)
#22 0x000056057508a752 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) CodeGenAction.cpp:0:0
#23 0x00005605752da080 clang::ParseAST(clang::Sema&, bool, bool) (build/stage3/bin/clang.inst+0x7eda080)
#24 0x0000560573041d85 clang::FrontendAction::Execute() (build/stage3/bin/clang.inst+0x5c41d85)
#25 0x0000560573041233 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (build/stage3/bin/clang.inst+0x5c41233)
#26 0x000056057303d806 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (build/stage3/bin/clang.inst+0x5c3d806)
#27 0x000056057303b840 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (build/stage3/bin/clang.inst+0x5c3b840)
#28 0x000056057303a6c5 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) driver.cpp:0:0
#29 0x00005605770989d8 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const::$_1>(long) Job.cpp:0:0
#30 0x0000560572a465db llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (build/stage3/bin/clang.inst+0x56465db)
#31 0x0000560572a45f51 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const (build/stage3/bin/clang.inst+0x5645f51)
#32 0x0000560572fc01c3 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (build/stage3/bin/clang.inst+0x5bc01c3)
#33 0x0000560572fbf65f clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) const (build/stage3/bin/clang.inst+0x5bbf65f)
#34 0x0000560572fbec16 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) (build/stage3/bin/clang.inst+0x5bbec16)
#35 0x0000560572faeffc main (build/stage3/bin/clang.inst+0x5baeffc)
#36 0x00007f7246982310 __libc_start_call_main libc-start.c:0:0
#37 0x00007f72469823c1 __libc_start_main@GLIBC_2.2.5 (/usr/lib/libc.so.6+0x2d3c1)
#38 0x0000560575a49cf1 _start (build/stage3/bin/clang.inst+0x8649cf1)
clang: error: clang frontend command failed with exit code 139 (use -v to see invocation)
clang version 15.0.0 (https://github.com/llvm/llvm-project bff8356b1969d2edd02e22c73d1c3d386f862937)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/nathan/cbl/src/llvm-project/build/stage3/bin
clang: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.
llvmbot commented 2 years ago

@llvm/issue-subscribers-bolt

maksfb commented 2 years ago

Thanks for sending the detailed repro. On the surface indeed it looks similar to the other instrumentation bug (https://github.com/llvm/llvm-project/issues/53994).

aaupov commented 2 years ago

@nathanchance Can you please clarify what OS and versions of clang/lld were used here? We couldn't reproduce the assertion but that might be due to some system differences.

nathanchance commented 2 years ago

@aaupov I believe I would have reproduced this on Arch Linux, as that is my primary distribution, which currently has the following versions:

$ /usr/bin/clang --version
clang version 13.0.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ /usr/bin/ld.lld --version
LLD 13.0.1 (compatible with GNU linkers)
aaupov commented 2 years ago

@nathanchance Thanks! Was able to repro the bug now with Ubuntu 20.04 and clang/lld 13.

Kobzol commented 2 years ago

Hi @aaupov, we're trying to use BOLT for optimizing the Rust compiler and we're hitting similar issues when trying to BOLTify LLVM. I wonder if there are any ongoing activities/investigation regarding the PGO clang/llvm crash? Thanks!

maksfb commented 2 years ago

Will put up the fix soon. Let’s see if it solves your problem too.

Kobzol commented 2 years ago

Any updates? I tried it with 15.0.0-rc1, but unfortunately the instrumented libLLVM.so still segfaults for us.

rafaelauler commented 2 years ago

I was able to repro and find the root cause. The symbol that represents the end of a table in .rodata is being colocated with the start of a jump table from another function, and BOLT moves that jump table. This causes the symbol representing the end of the table to be moved as well. The new location is a few MB away in distance, significantly increasing the size of this table as perceived by the application. The application (clang) then crashes scanning values in the table -- because it has the wrong end-of-table address, the loop that scans this table goes out of bounds until it reaches an unmapped address in memory and then segfaults.

We're working on a fix.

avikivity commented 1 year ago

Please backport to 15.0.x.

aaupov commented 1 year ago

/cherry-pick 4f158995b9cddae392bfb5989af8c83101ae0789

llvmbot commented 1 year ago

/cherry-pick 4f158995b9cddae392bfb5989af8c83101ae0789

Error: Command failed due to missing milestone.

nickdesaulniers commented 1 year ago

Please backport to 15.0.x.

Is there a plan for a 15.0.7 release? Otherwise, I think it's too late. 15.0.6 may have been the final 15.0.X release.

avikivity commented 1 year ago

I don't understand. Will there be no clang maintenance releases until 16.0.0? That's almost a year away.

nathanchance commented 1 year ago

That's almost a year away.

I do not think that is far away. The release documentation states release/16.x should be cut January 24th and the final release should be six weeks after that. Even accounting for an extra month and a half of delays for some reason, that is still just four months away.

You could always ask your LLVM distributor to cherry pick this patch if you are not building it yourself.

avikivity commented 1 year ago

Ah, I assumed a yearly cycle, perhaps I confused it with gcc.

Still, to keep at least the last release supported, not dead for 4.5 months.

Of course I can ask Fedora to backport the patch, but it's much nicer if the experts decide which patch merits backports, and the entire community benefits.