microsoft / DirectXShaderCompiler

This repo hosts the source for the DirectX Shader Compiler which is based on LLVM/Clang.
Other
3.11k stars 698 forks source link

`libdxcompiler.so` locks up when used in many threads at once #4792

Open MarijnS95 opened 2 years ago

MarijnS95 commented 2 years ago

We're parallelizing our shader (asset) builds and hit a snag where - on Linux - this process locks up fairly often. With some debugging in gdb we've pinpointed the issue to reside inside CALL_ONCE_INITIALIZATION on a static volatile:

https://github.com/microsoft/DirectXShaderCompiler/blob/6dd31be007cf376d0679b642f3d393ec2125e629/include/llvm/PassSupport.h#L36-L54

All threads are either stuck on the MemoryFence(), or initializeLoopSimplifyPass directly above that:

... non-worker threads
  23   Thread 0x7fb450bff6c0 (LWP 128163) "ci-asset-builde" 0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
  24   Thread 0x7fb4509fe6c0 (LWP 128164) "ci-asset-builde" 0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
  25   Thread 0x7fb4507fd6c0 (LWP 128165) "ci-asset-builde" 0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
  26   Thread 0x7fb4505fc6c0 (LWP 128166) "ci-asset-builde" 0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
  27   Thread 0x7fb4503fb6c0 (LWP 128167) "ci-asset-builde" 0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
  28   Thread 0x7fb433fff6c0 (LWP 128168) "ci-asset-builde" 0x00007fb31eb3b4db in llvm::initializeLoopSimplifyPass(llvm::PassRegistry&) () from libdxcompiler.so
  29   Thread 0x7fb433dfe6c0 (LWP 128169) "ci-asset-builde" 0x00007fb31eb3b4d6 in llvm::initializeLoopSimplifyPass(llvm::PassRegistry&) () from libdxcompiler.so
  30   Thread 0x7fb433bfd6c0 (LWP 128170) "ci-asset-builde" 0x00007fb31eb3b4db in llvm::initializeLoopSimplifyPass(llvm::PassRegistry&) () from libdxcompiler.so
  31   Thread 0x7fb4339fc6c0 (LWP 128171) "ci-asset-builde" 0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
... and many more threads
(gdb) thread 23
[Switching to thread 23 (Thread 0x7fb450bff6c0 (LWP 128163))]
#0  0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
(gdb) bt
#0  0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
#1  0x00007fb31eb3b4db in llvm::initializeLoopSimplifyPass(llvm::PassRegistry&) () from libdxcompiler.so
#2  0x00007fb31eb3bbaa in llvm::Pass* llvm::callDefaultCtor<LoopSimplify>() () from libdxcompiler.so
#3  0x00007fb31f5dc884 in llvm::PMTopLevelManager::schedulePass(llvm::Pass*) () from libdxcompiler.so
#4  0x00007fb31f4b44d5 in addHLSLPasses(bool, unsigned int, bool, bool, bool, hlsl::HLSLExtensionsCodegenHelper*, llvm::legacy::PassManagerBase&) () from libdxcompiler.so
#5  0x00007fb31f4b3958 in llvm::PassManagerBuilder::populateModulePassManager(llvm::legacy::PassManagerBase&) () from libdxcompiler.so
#6  0x00007fb31eb9295b in clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::raw_pwrite_stream*) ()
   from libdxcompiler.so
#7  0x00007fb31eb80189 in clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) () from libdxcompiler.so
#8  0x00007fb31f3e942c in clang::ParseAST(clang::Sema&, bool, bool) () from libdxcompiler.so
#9  0x00007fb31ecf0ba8 in clang::FrontendAction::Execute() () from libdxcompiler.so
#10 0x00007fb31e727649 in DxcCompiler::Compile(DxcBuffer const*, wchar_t const**, unsigned int, IDxcIncludeHandler*, _GUID const&, void**) () from libdxcompiler.so
#11 0x00007fb31e721272 in hlsl::DxcCompilerAdapter::WrapCompile(bool, IDxcBlob*, wchar_t const*, wchar_t const*, wchar_t const*, wchar_t const**, unsigned int, DxcDefine const*, unsigned int, IDxcIncludeHandler*, IDxcOperationResult**, wchar_t**, IDxcBlob**) ()
   from libdxcompiler.so
#12 0x00007fb31e72219f in hlsl::DxcCompilerAdapter::CompileWithDebug(IDxcBlob*, wchar_t const*, wchar_t const*, wchar_t const*, wchar_t const**, unsigned int, DxcDefine const*, unsigned int, IDxcIncludeHandler*, IDxcOperationResult**, wchar_t**, IDxcBlob**) ()
   from libdxcompiler.so
#13 0x00007fb31e722ec8 in hlsl::DxcCompilerAdapter::Compile(IDxcBlob*, wchar_t const*, wchar_t const*, wchar_t const*, wchar_t const**, unsigned int, DxcDefine const*, unsigned int, IDxcIncludeHandler*, IDxcOperationResult**) ()
   from libdxcompiler.so

Without diving into the code and whether this may or may not be free of race conditions or not guarantee forward progress, I forward-ported some of the upstream LLVM changes to at least use std::call_once instead of this custom implementation in https://github.com/MarijnS95/DirectXShaderCompiler/compare/import-llvm_once-upstream-changes. Unfortunately this also locks up:

...
  23   Thread 0x7f636bfff6c0 (LWP 146834) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  24   Thread 0x7f636bdfe6c0 (LWP 146835) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  25   Thread 0x7f636bbfd6c0 (LWP 146836) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  26   Thread 0x7f636b9fc6c0 (LWP 146837) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  27   Thread 0x7f636b7fb6c0 (LWP 146838) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  28   Thread 0x7f636b5fa6c0 (LWP 146839) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  29   Thread 0x7f636b3f96c0 (LWP 146840) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  30   Thread 0x7f636b1f86c0 (LWP 146841) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  31   Thread 0x7f636aff76c0 (LWP 146842) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
...
(gdb) thread 23
[Switching to thread 23 (Thread 0x7f636bfff6c0 (LWP 146834))]
#0  0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
#1  0x00007f63a47069a3 in __gthread_once (__once=0x7f63a5821800 <InitializeLoopSimplifyPassFlag>, __func=0x80)
    at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../include/c++/12.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h:700
#2  std::call_once<void* (&)(llvm::PassRegistry&), std::reference_wrapper<llvm::PassRegistry> > (__once=..., __f=<optimized out>,
    __args=...) at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../include/c++/12.2.0/mutex:859
#3  llvm::call_once<void* (&)(llvm::PassRegistry&), std::reference_wrapper<llvm::PassRegistry> > (flag=..., F=<optimized out>,
    ArgList=...) at include/llvm/Support/Threading.h:92
#4  llvm::initializeLoopSimplifyPass (Registry=...)
    at lib/Transforms/Utils/LoopSimplify.cpp:751
... same as above

(Note that this happened to be a RelWithDebInfo build versus Release above, but does not change the outcome)


Fortunately we also stumbled upon https://reviews.llvm.org/D19271: this patch has not yet been applied, but is much simpler in design by replacing the entire call_once logic with a simple static initializer through a lambda expression. We are now running with this and have not yet observed any erratic behaviour.

Additional context

DXC was tested on the latest commit as of yesterday: 24ca1f498

MarijnS95 commented 2 years ago

For completeness, here's a little internal testing matrix, where I tried to reach 100 runs of our build system, running 64 jobs in parallel on a ThreadRipper 3970X:

DXC built on Release RelWithDebInfo std::call_once + RelWithDebInfo static bool initializer + Release
Size ±27MB ±330MB ±330MB ±27MB
DXC CI Unavailable Unavailable Appveyor build from #4818 [^1]
Our CI ✔️ ✔️
Marijn workstation ✔️ [^2] ❌ Locked up on run 18! ❌ Locked up on run 17! ✔️

On all failed runs it typically happens well before reaching 10.

[^1]: Unable to build many of our shaders due to an unrelated SPIR-V codegen issue. [^2]: My workstation builds DXC with a too-new GLIBC for the shader compiler to be used on our Continuous Integration; and I don't like pushing custom-built assets there.