Open MarijnS95 opened 2 years ago
For completeness, here's a little internal testing matrix, where I tried to reach 100 runs of our build system, running 64 jobs in parallel on a ThreadRipper 3970X:
DXC built on | Release |
RelWithDebInfo |
std::call_once + RelWithDebInfo |
static bool initializer + Release |
---|---|---|---|---|
Size | ±27MB | ±330MB | ±330MB | ±27MB |
DXC CI | ❌ | Unavailable | Unavailable | ⛔ Appveyor build from #4818 [^1] |
Our CI | ❌ | ✔️ | ✔️ | |
Marijn workstation | ✔️ [^2] | ❌ Locked up on run 18! | ❌ Locked up on run 17! | ✔️ |
On all failed runs it typically happens well before reaching 10.
[^1]: Unable to build many of our shaders due to an unrelated SPIR-V codegen issue. [^2]: My workstation builds DXC with a too-new GLIBC for the shader compiler to be used on our Continuous Integration; and I don't like pushing custom-built assets there.
We're parallelizing our shader (asset) builds and hit a snag where - on Linux - this process locks up fairly often. With some debugging in
gdb
we've pinpointed the issue to reside insideCALL_ONCE_INITIALIZATION
on astatic volatile
:https://github.com/microsoft/DirectXShaderCompiler/blob/6dd31be007cf376d0679b642f3d393ec2125e629/include/llvm/PassSupport.h#L36-L54
All threads are either stuck on the
MemoryFence()
, orinitializeLoopSimplifyPass
directly above that:Without diving into the code and whether this may or may not be free of race conditions or not guarantee forward progress, I forward-ported some of the upstream LLVM changes to at least use
std::call_once
instead of this custom implementation in https://github.com/MarijnS95/DirectXShaderCompiler/compare/import-llvm_once-upstream-changes. Unfortunately this also locks up:(Note that this happened to be a
RelWithDebInfo
build versusRelease
above, but does not change the outcome)Fortunately we also stumbled upon https://reviews.llvm.org/D19271: this patch has not yet been applied, but is much simpler in design by replacing the entire
call_once
logic with a simplestatic
initializer through a lambda expression. We are now running with this and have not yet observed any erratic behaviour.Additional context
DXC was tested on the latest commit as of yesterday: 24ca1f498