Open vient opened 4 months ago
Interesting. Does this also reproduce when using a different stage1 compiler (e.g. previous LLVM release).
As a first step, we need to clarify if this is caused by a miscompile by the stage1 compiler or if there's some kind of bug in the code that's crashing.
Tried to just use clang-17 in last cmake, got unsupported instrumentation profile format version
, fair.
Rebuilding instrumented clang-18 using clang-17 now. Funny thing is that I got segmentation fault in lld-17, which I traced to option --icf=all
. --icf=safe
also crashes, so continuing without it, I think it should not matter here.
Edit: ok i'll first reduce original reproducer, right now it has a ton of options.
I've reduced original repro to "gather some profiles", and then
cmake -G Ninja /root/llvm-build/llvm-project/llvm \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=clang-18 \
-DCMAKE_CXX_COMPILER=clang++-18 \
-DCMAKE_C_FLAGS="-fstrict-vtable-pointers -O0 -Wno-backend-plugin -g" \
-DCMAKE_CXX_FLAGS="-fstrict-vtable-pointers -O0 -Wno-backend-plugin -g" \
-DCMAKE_INSTALL_PREFIX="/root/llvm-build/stage2-prof-use/install" \
-DLLVM_PROFDATA_FILE="/root/llvm-build/stage2-prof-gen/profiles/clang.profdata" \
-DLLVM_USE_LINKER=lld \
-DLLVM_ENABLE_PROJECTS='clang' \
-DLLVM_ENABLE_RUNTIMES='compiler-rt;libcxxabi;libcxx;libunwind' \
-DLLVM_ENABLE_ZLIB=FORCE_ON \
-DLLVM_ENABLE_ZSTD=FORCE_ON \
-DLLVM_ENABLE_BINDINGS=OFF \
-DLLVM_INCLUDE_BENCHMARKS=OFF \
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_INCLUDE_TESTS=OFF \
-DLLVM_TARGETS_TO_BUILD=X86
ninja install
clang-18 is taken from LLVM apt repo for Ubuntu 20.04
$ clang-18 --version
Ubuntu clang version 18.1.4 (++20240416114258+e3c832b37b0a-1~exp1~20240416234314.99)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Removing -fstrict-vtable-pointers
fixes the issue.
Strangely, my stage1 is compiled with -fstrict-vtable-pointers
too but it works fine. Seems there is some incompatibility between it and PGO? Also, why does it trigger if -O0
is set?
Until now I used profile created from 11k .profraw files, so I decided to try to reduce quantity of raw files. During this process I found a single .profraw which, if included to profile, makes PGO Clang crash.
First I bisected number of profraw files, taking a prefix from sorted list. That process gave me number 1191 - Clang compiled with profile created from first 1190 profraw files does not crash. Then I created profile from that single profraw number 1191, and it crashed. 1 bad profraw out of 1200 seems pretty rare, thought I did not check remaining 9k profiles.
Attaching both profraw file and generated profdata from it 91041_profdata.tar.gz. Everything is done using llvm-18 from official repo. With this, reproducer looks like this
ROOT=/root/llvm-build
mkdir -p "${ROOT}" && cd "${ROOT}"
git clone -b llvmorg-18.1.5 --depth=1 https://github.com/llvm/llvm-project.git
mkdir -p build && cd build
# llvm-profdata-18 merge -output=bad.profdata default_4023465489017712588_0.profraw
cmake -G Ninja "${ROOT}/llvm-project/llvm" \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=clang-18 \
-DCMAKE_CXX_COMPILER=clang++-18 \
-DCMAKE_C_FLAGS="-fstrict-vtable-pointers -O0 -Wno-backend-plugin -g" \
-DCMAKE_CXX_FLAGS="-fstrict-vtable-pointers -O0 -Wno-backend-plugin -g" \
-DCMAKE_INSTALL_PREFIX="${PWD}/install" \
-DLLVM_PROFDATA_FILE="bad.profdata" \
-DLLVM_USE_LINKER=lld \
-DLLVM_ENABLE_PROJECTS='clang' \
-DLLVM_ENABLE_RUNTIMES='compiler-rt;libcxxabi;libcxx;libunwind' \
-DLLVM_ENABLE_ZLIB=FORCE_ON \
-DLLVM_ENABLE_ZSTD=FORCE_ON \
-DLLVM_ENABLE_BINDINGS=OFF \
-DLLVM_INCLUDE_BENCHMARKS=OFF \
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_INCLUDE_TESTS=OFF \
-DLLVM_TARGETS_TO_BUILD=X86
ninja install # clang segfaults while compiling runtimes
I noticed two kinds of stacktraces, they are pretty similar but still:
#6 0x0000561a8a8defd4 llvm::PHINode::getBasicBlockIndex(llvm::BasicBlock const*) const /root/llvm-build/llvm-project/llvm/include/llvm/IR/Instructions.h:2898:28
#7 0x0000561a8a8defd4 llvm::PHINode::getIncomingValueForBlock(llvm::BasicBlock const*) const /root/llvm-build/llvm-project/llvm/include/llvm/IR/Instructions.h:2904:15
#8 0x0000561a8a8defd4 llvm::LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(llvm::VFRange&) /root/llvm-build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8762:18
#9 0x0000561a8a8b9ec1 std::__uniq_ptr_impl<llvm::VPlan, std::default_delete<llvm::VPlan>>::_M_ptr() const /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/unique_ptr.h:193:51
#10 0x0000561a8a8b9ec1 std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan>>::get() const /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/unique_ptr.h:464:21
#11 0x0000561a8a8b9ec1 std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan>>::operator bool() const /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/unique_ptr.h:481:16
#12 0x0000561a8a8b9ec1 llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(llvm::ElementCount, llvm::ElementCount) /root/llvm-build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8600:14
#13 0x0000561a8a8b960d llvm::LoopVectorizationPlanner::plan(llvm::ElementCount, unsigned int) /root/llvm-build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7419:3
#14 0x0000561a8a8b0b58 llvm::LoopVectorizePass::processLoop(llvm::Loop*) /root/llvm-build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10004:12
#6 0x00005567bb2d7fed llvm::Use::get() const /root/llvm-build/llvm-project/llvm/include/llvm/IR/Use.h:66:31
#7 0x00005567bb2d7fed llvm::PHINode::getOperand(unsigned int) const /root/llvm-build/llvm-project/llvm/include/llvm/IR/Instructions.h:2956:1
#8 0x00005567bb2d7fed llvm::PHINode::getIncomingValue(unsigned int) const /root/llvm-build/llvm-project/llvm/include/llvm/IR/Instructions.h:2804:12
#9 0x00005567bb2d7fed llvm::PHINode::getIncomingValueForBlock(llvm::BasicBlock const*) const /root/llvm-build/llvm-project/llvm/include/llvm/IR/Instructions.h:2906:12
#10 0x00005567bb2d7fed llvm::LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(llvm::VFRange&) /root/llvm-build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8762:18
#11 0x00005567bb2b2ec1 std::__uniq_ptr_impl<llvm::VPlan, std::default_delete<llvm::VPlan>>::_M_ptr() const /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/unique_ptr.h:193:51
#12 0x00005567bb2b2ec1 std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan>>::get() const /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/unique_ptr.h:464:21
#13 0x00005567bb2b2ec1 std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan>>::operator bool() const /usr/bin/../lib/gcc/x86_64-linux-gnu/14.0.0/../../../../include/c++/14.0.0/bits/unique_ptr.h:481:16
#14 0x00005567bb2b2ec1 llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(llvm::ElementCount, llvm::ElementCount) /root/llvm-build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8600:14
#15 0x00005567bb2b260d llvm::LoopVectorizationPlanner::plan(llvm::ElementCount, unsigned int) /root/llvm-build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7419:3
#16 0x00005567bb2a9b58 llvm::LoopVectorizePass::processLoop(llvm::Loop*) /root/llvm-build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10004:12
These stack traces should be more accurate than in initial post since I don't use any optimizations now.
I also found another profraw file with exactly the same size which does not trigger the crash. Although now I understand that profraw size depends on counter values. Can be there any differences in profraw files besides counter values?
Please let me know if I can do anything else here. Right now I'm out of ideas as someone who is not an LLVM developer.
+1, encountering the same issue with LLVM17.0.2 on Windows, using ClangCL as Stage1, shipped with MSVC 2022
fyi @fhahn and CC @akyrtzi
In our test case, the crash was in llvm_blake3_compress_xof_sse41
. Disabling with -DLLVM_DISABLE_ASSEMBLY_FILES=ON
fixes the issue under PGO. We also previously encountered similar crash when cross-compiling Windows Toolchain under Linux host, without PGO.
I suspect that part of the code is very broken
(Maybe also https://github.com/llvm/llvm-project/issues/81967 )
I use tag
llvmorg-18.1.5
. Build without PGO works fine, then I gathered some IR profiles and tried to rebuild LLVM. It crashed while trying to build runtimes with what looked like the same error. I've taken one example source and ran it through cvise.The crash does not reproduce with clang built without PGO
Crash:
locale-8f8074.cpp
cmake command used to build crashing compiler
The crash does not reproduce when Clang is built without PGO. I can provide any additional files if needed.
I'll try to get rid of as much cmake arguments as possible next.