Improve DXC build time? Maybe using PCH (3m40 for current 12 core CPU)

microsoft / DirectXShaderCompiler

This repo hosts the source for the DirectX Shader Compiler which is based on LLVM/Clang.

Other

3.08k stars 686 forks source link

Improve DXC build time? Maybe using PCH (3m40 for current 12 core CPU) #5042

Closed devshgraphicsprogramming closed 1 year ago

devshgraphicsprogramming commented 1 year ago

I have a Vulkan+CUDA (and until recently OpenGL, ES and OpenCL) framework which used to do a clean build including all its dependencies (SIMDJSON, OpenEXR, Expat XML Parser, zlib, libjpeg, OneAPI, etc.) several times faster than just DXC.

All while it literally abuses templates everywhere.

I undestand that DXC is basically LLVM and LLVM takes up to an hour on my machien with the new releases like 16.0.

But my Ryzen 3900x is barely sitting at 18% utilization when building DXC.

Can we enable PCH or something in the DXC build system to build the DLL faster? Any other pointers about what we can do?

Keenuts commented 1 year ago

Hi!

Can you provide more information about your setup? Are you building from disk, ram, hard drive or ssd? You mention DLL, so assuming you build on windows? Using cmake + visual studio? For reference, building DXC on linux with cmake + ninja (clean build) on my machine takes 55 seconds. (Threadripper PRO 3995WX, 128 threads, enough ram, nvme ssd, building from ssd)

devshgraphicsprogramming commented 1 year ago

3m 40 s

Debug Build Windows CMake + Visual Studio

CPU: Ryzen 9 39000X 12 core 24 thread

RAM: 32GB 2400Mhz DDR4

SSD: ADATA XPG SX8200 Pro 1TB M.2

Keenuts commented 1 year ago

3m40 for a clean build doesn't seem insane in regards to the setup. (You mentioned that LLVM takes hours, so assumed DXC took hours on your end too, which would indeed be too much).

When you say stays at 18%, is it the "windows average CPU usage" that you report? What about per-thread? Because from your numbers, it looks on-par with cmake+ninja (which is taking advantage of all cores): 128 threads, 55s -> theory 7040s monothread. 24 threads, 220s -> theory 5280s monothread. One step that is monothread on my end, and slow, is the linking, which takes 30s with ld, or 10s with mold.

Not very familiar with windows/VS build system, so not sure about the implication of using precompiled headers, which seem to be visual-studio specific. If you have an example PR, we could see if there is a clear improvement (and that would save bot-time so nice) 😊

devshgraphicsprogramming commented 1 year ago

Well 3m40 is AFTER we made it use all the cores, so CPU usage no longer 18%

Btw one does not simply add_subdirectory(dxc) to their build system, we've had to sanify it A LOT and literally separate it from our main cmake by doing a Local Fetch Content. https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/3rdparty/dxc/CMakeLists.txt

@AnastaZIuk is our Build System guy so I'll hand off the conversation there.

Not very familiar with windows/VS build system, so not sure about the implication of using precompiled headers, which seem to be visual-studio specific. If you have an example PR, we could see if there is a clear improvement (and that would save bot-time so nice)

Cmake now has a PCH function for all compilers https://cmake.org/cmake/help/latest/command/target_precompile_headers.html

W.r.t. linking, maybe we could use incremental linking on non-Release builds?

llvm-beanz commented 1 year ago

We will accept patches to address build performance if they don't add significant maintenance overhead. I don't think we want to adopt PCH because they tend to cause more problems than they solve.

We're not going to put resources on this problem. I build DXC on my laptop in under 5 minutes from clean and most incremental builds are under a minute. I use sccache so that I can switch branches quickly, and in general don't struggle with slow builds. Most of our CI (excluding AppVeyor which we're trying to EOL), is sufficiently speedy that build times aren't a real bottleneck for contributions.