llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.04k stars 11.98k forks source link

Regression since 0efe111365 when building opencv with clang-cl 17.0.6 on Windows: clang-cl just hangs #69428

Closed emmenlau closed 7 months ago

emmenlau commented 1 year ago

I have reported a build issue of opencv in https://github.com/opencv/opencv/issues/24390. If needed I can clone the information here. Sadly, I am unable to provide a minimal reproducer, and building opencv is slightly more involved.

It would be great if somebody could still look at this, as opencv is quite a relevant library...

DimitryAndric commented 1 year ago

One of the things you can try is identifying one or more instances of clang-cl that hang, and attempt to make it crash, so it produces test case files (preprocessed .cpp and .sh). On Linux you would simply send a SIGABRT to the process, and this would cause the clang-cl driver to create test case files, but on Windows I am unsure.

emmenlau commented 1 year ago

Thanks a lot @DimitryAndric for this suggestion! If you or someone knows how I can create the test case files (on Windows), I would be happy to do so!

Neumann-A commented 1 year ago

I observe the same issue with 17.0.4. Trying to install opencv4 via vcpkg using my clang-cl toolchain just hangs forever.

emmenlau commented 12 months ago

Did anyone try with 17.0.5 yet?

Neumann-A commented 11 months ago

Did anyone try with 17.0.5 yet?

Same issue.

Neumann-A commented 11 months ago

@emmenlau Did you check your build logs carefully? I noticed when bisceting that there was an ICE somewhere at the start of the logs which I didn't notice at first. I don't know why ninja did not abort the build in this case and left it hanging.

emmenlau commented 11 months ago

Hi @Neumann-A , I checked by logs back then quite carefully: I compared them with a diff-view with a working build using clang 16.x. As far as I could see there was no ICE in my case with clang 17.0.3. But its interesting that you could move one step ahead! One thing I found (but can not try myself): there is an option where clang would print the full diagnostics even without a crash! Maybe this can help developers isolate the problem?

Here is link: https://clang.llvm.org/docs/UsersManual.html#options-to-control-clang-crash-diagnostics

From that page, I quote:

Clang is also capable of generating preprocessed source file(s) and associated run script(s) even without a crash. This is specially useful when trying to generate a reproducer for warnings or errors while using modules.

I guess with such a reproducer, the developers could help resolve the issue...

emmenlau commented 10 months ago

Did anyone try clang-cl 17.0.6 yet?

Neumann-A commented 10 months ago

Did anyone try clang-cl 17.0.6 yet?

Same issue. Master from end of November also same issue. I don't think it will be fixed for 18.

https://github.com/backengineering/llvm-msvc doesn't seem to have the issue

emmenlau commented 10 months ago

Thanks @Neumann-A ! I'll try to run the build today with -gen-reproducer in the hope that devs will consider fixing this issue.

Neumann-A commented 10 months ago

I mean I even bisected the issue. The ICE/hang happens since https://github.com/llvm/llvm-project/commit/0efe111365ae176671e01252d24028047d807a84. Reverting it fixes it.

emmenlau commented 10 months ago

Oh my, that is very relevant! Thanks for sharing!!!

emmenlau commented 9 months ago

Dear LLVM devs, could someone kindly consider this compiler hang? It is quite relevant to build openCV, which is a rather relevant library in the image analysis community...

DimitryAndric commented 9 months ago

Somebody still needs to provide a .sh and .cpp file from one of those hanging builds. This is essential for reproducing the problem, and attempting to fix it.

DimitryAndric commented 9 months ago

Also, ping @phoebewang @efriedma-quic @tentzen who originated https://reviews.llvm.org/D102817 for commit 0efe111365ae176671e01252d24028047d807a84.

Neumann-A commented 9 months ago

Somebody still needs to provide a .sh and .cpp file from one of those hanging builds. This is essential for reproducing the problem, and attempting to fix it.

73538 and #73536 have reproducers and are related

DimitryAndric commented 9 months ago

I managed to configure and build opencv on Windows against Visual Studio and the official LLVM 17.0.6 package, and could intermittently reproduce the hangs: that is, some clang-cl instances hung but not consistently when you repeated the exact same command line.

After retrieving the exact command line for a hanging instance, I could generate a preprocessed test case, where the hang occurs when compiling the intermediate .bc to .asm:

"C:\Program Files\LLVM\bin\clang-cl.exe" -cc1 -triple x86_64-pc-windows-msvc19.38.33134 -S -save-temps=cwd -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name execution_engine.cpp -mrelocation-model pic -pic-level 2 -mframe-pointer=none -relaxed-aliasing -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu x86-64 -target-feature +sse -target-feature +sse2 -target-feature +sse3 -mllvm -x86-asm-syntax=intel -tune-cpu generic -D_MT -D_DLL --dependent-lib=msvcrt --dependent-lib=oldnames --show-includes -sys-header-deps -stack-protector 2 -fexceptions -fasync-exceptions -fms-volatile -fdiagnostics-format msvc -v -ffunction-sections "-fcoverage-compilation-dir=C:\Users\Dim\Source\opencv\build" -resource-dir "C:\PROGRA~1\LLVM\lib\clang\17" -O2 -WCL4 -W -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winconsistent-missing-override -Wno-delete-non-virtual-dtor -Wno-unnamed-type-template-args -Wno-comment -Wno-deprecated-enum-enum-conversion -Wno-deprecated-anon-enum-enum-conversion -Wno-long-long "-fdebug-compilation-dir=C:\Users\Dim\Source\opencv\build" -ferror-limit 19 -fmessage-length=178 -fno-use-cxa-atexit -fms-extensions -fms-compatibility -fms-compatibility-version=19.38.33134 -fdelayed-template-parsing -finline-functions -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -o execution_engine.asm -x ir execution_engine.bc

I transported this test case to Linux where I have more tools to do reduction, and I ended up with the following reduced test case:

// clang-cl -cc1 -triple x86_64-pc-windows-msvc19.38.33134 -S -disable-llvm-verifier -fexceptions -fasync-exceptions -O2 execution_engine-min.cpp
template <bool, class _Ty1, class> using conditional_t = _Ty1;
template <class _Ty1, class _Ty2>
constexpr bool is_same_v = __is_same(_Ty1, _Ty2);
struct _Alloc_construct_ptr {
  ~_Alloc_construct_ptr();
};
template <class _Alnode> struct _List_node_emplace_op2 : _Alloc_construct_ptr {
  _List_node_emplace_op2(_Alnode);
  ~_List_node_emplace_op2() { ; }
};
int _List;
struct {
  template <class... _Valtys>
  conditional_t<is_same_v<int, int>, int, int> emplace(_Valtys... _Vals) {
    _List_node_emplace_op2(_List, _Vals...);
  }
} m_executableDependencies;
void ExecutionEngineaddExecutableDependency() {
  m_executableDependencies.emplace();
}

This reliably produces an assertion in WinEHPrepare.cpp (if the llvm in question is compiled with assertions, which the release builds are not), after an initial "A single unwind edge may only enter one EH pad" error:

A single unwind edge may only enter one EH pad
  invoke void @llvm.seh.scope.end()
          to label %"??1?$_List_node_emplace_op2@H@@QEAA@XZ.exit.i" unwind label %ehcleanup.i.i
Assertion failed: (!verifyFunction(F, &dbgs())), function prepareExplicitEH, file /share/dim/src/llvm/llvm-project/llvm/lib/CodeGen/WinEHPrepare.cpp, line 1210.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl -cc1 -triple x86_64-pc-windows-msvc19.38.33134 -S -disable-llvm-verifier -fexceptions -fasync-exceptions -O2 execution_engine-min.cpp
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'execution_engine-min.cpp'.
4.      Running pass 'Windows exception handling preparation' on function '@"?ExecutionEngineaddExecutableDependency@@YAXXZ"'
 #0 0x00000000042d05c8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x42d05c8)
 #1 0x00000000042ce129 llvm::sys::RunSignalHandlers() (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x42ce129)
 #2 0x00000000042d0dc8 SignalHandler(int) Signals.cpp:0:0
 #3 0x00000008297c6490 handle_signal /share/dim/src/freebsd/llvm-18-update/lib/libthr/thread/thr_sig.c:0:3
 #4 0x00000008297c5a4b thr_sighandler /share/dim/src/freebsd/llvm-18-update/lib/libthr/thread/thr_sig.c:245:1
 #5 0x00000008290772d3 ([vdso]+0x2d3)
 #6 0x000000082e398e1a _thr_kill /usr/obj/share/dim/src/freebsd/llvm-18-update/amd64.amd64/lib/libc/thr_kill.S:4:0
 #7 0x000000082e312a94 __raise /share/dim/src/freebsd/llvm-18-update/lib/libc/gen/raise.c:0:10
 #8 0x000000082e3c5799 abort /share/dim/src/freebsd/llvm-18-update/lib/libc/stdlib/abort.c:67:17
 #9 0x000000082e2f5d81 (/lib/libc.so.7+0x99d81)
#10 0x0000000003c3cc1a (anonymous namespace)::WinEHPrepareImpl::prepareExplicitEH(llvm::Function&) WinEHPrepare.cpp:0:0
#11 0x0000000003c389b1 (anonymous namespace)::WinEHPrepare::runOnFunction(llvm::Function&) WinEHPrepare.cpp:0:0
#12 0x0000000003defcb1 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x3defcb1)
#13 0x0000000003df82a4 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x3df82a4)
#14 0x0000000003df086e llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x3df086e)
#15 0x0000000004a43178 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x4a43178)
#16 0x0000000004a62d19 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x4a62d19)
#17 0x000000000671a8c6 clang::ParseAST(clang::Sema&, bool, bool) (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x671a8c6)
#18 0x0000000004e61883 clang::FrontendAction::Execute() (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x4e61883)
#19 0x0000000004dd63cd clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x4dd63cd)
#20 0x0000000004f39305 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x4f39305)
#21 0x0000000002722edc cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x2722edc)
#22 0x000000000271fcd2 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#23 0x000000000271eb3d clang_main(int, char**, llvm::ToolContext const&) (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x271eb3d)
#24 0x000000000272ea14 main (/home/dim/obj/llvmorg-18-init-18014-g490a09a02e81-freebsd15-amd64-ninja-clang-rel-1/bin/clang-cl+0x272ea14)
#25 0x000000082e2e734a __libc_start1 /share/dim/src/freebsd/llvm-18-update/lib/libc/csu/libc_start1.c:157:2

This is the same error and assertion reported in #73536 and #73538.

phoebewang commented 9 months ago

cc @robertcox-github

emmenlau commented 6 months ago

Is a1f4ac7 in ClangCl 18.1.4? Because I still can not build OpenCV, it still hangs :-( :-(

phoebewang commented 6 months ago

Is a1f4ac7 in ClangCl 18.1.4? Because I still can not build OpenCV, it still hangs :-( :-(

Not, it's not getting cherry-picked to 18 release.

emmenlau commented 6 months ago

Thanks a lot for the feedback @phoebewang ! And there are probably strong reasons against picking it for the 18.x series, yes? It would be really great to have OpenCV build working again... :-(

Nick-Kooij commented 5 months ago

The recently released Microsoft Visual Studio 2022 17.9.10 now ships with headers that only support LLVM 17 and up, compounding the issue.

Developers impacted by this issue, using clang-cl on Windows with using the latest Microsoft IDE/header, will be soon be stuck:

EugeneZelenko commented 5 months ago

Thanks a lot for the feedback @phoebewang ! And there are probably strong reasons against picking it for the 18.x series, yes? It would be really great to have OpenCV build working again... :-(

Backports to 18.1.x were stopped recently, so it'll be necessary to wait for 19.

efriedma-quic commented 5 months ago

This specifically only impacts code built with /EHa; does OpenCV really need to be built with that flag?

phoebewang commented 5 months ago

This specifically only impacts code built with /EHa; does OpenCV really need to be built with that flag?

Right. The /EHa was a dud in LLVM before 17. Removing it should have no side effect.

emmenlau commented 5 months ago

@efriedma-quic , thanks a billion for this insight! No, I do not require /EHa and can happily build without. This reduces the severity of this issue a whole lot!

This does not work for me. I've replaced /EHa with /EHsc and the compiler (clang-cl 18.1.6 from 5 days ago) still hangs (currently 15 minutes on a single source file, and counting).

Did you mean to disable all exception handling? Or are there exception handling models that are expected to work?

phoebewang commented 5 months ago

/EHsc would be an independent issue. How about just removing /EHa?

emmenlau commented 5 months ago

After removing /EHa the build complains that exceptions are not enabled (unless I'm just unable to configure OpenCV correctly and did a mistake, haha). Is that likely possible that removing /EHa alltogether disables exceptions?

phoebewang commented 5 months ago

No sure, what I know is /EHa doesn't really enable asynchronous exceptions before LLVM17. Maybe it enabled partial or maybe the build script just checking the /EH strings?

emmenlau commented 5 months ago

I've spend some time reading up on this, and I'm under the impression that any of the /EHx options needs to be added, otherwise exceptions are turned off by the compiler. See for example here and here.

This explains why removing /EHa disabled exceptions in opencv alltogether, thereby breaking the build.

/EHsc would be an independent issue.

Can you elaborate about an independent issue? So does the current fix in clang 19 not address the issue of a compiler hang with /EHsc? Should I report it as a new issue?

phoebewang commented 5 months ago

This explains why removing /EHa disabled exceptions in opencv alltogether, thereby breaking the build.

I'm not expert of Clang driver. Just took a quick look, maybe you can try -Xclang -fcxx-exceptions -Xclang -fexceptions to enable exceptions without a /EH*?

Can you elaborate about an independent issue? So does the current fix in clang 19 not address the issue of a compiler hang with /EHsc? Should I report it as a new issue?

I didn't see it else where. My justification is 1) /EHsc would not generate such llvm.seh.* intrinsics 2) the fixed issue is a crash issue if compiler built with assert on. A hang may or may not related with it. Did you check if the /EHsc option works with trunk code?