llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.37k stars 12.15k forks source link

cl.exe very slow on AST\Interp\Interp.cpp #102513

Open vitalybuka opened 3 months ago

vitalybuka commented 3 months ago

https://lab.llvm.org/buildbot/#/builders/sanitizer-windows

command timed out: 1200 seconds without output running ['python', '../llvm-zorg/zorg/buildbot/builders/annotated/sanitizer-windows.py', '--jobs=16'], attempting to kill

From the bot:


 C:\PROGRA~2\MIB055~1\2019\PROFES~1\VC\Tools\MSVC\1429~1.301\bin\Hostx64\x64\cl.exe  /nologo /TP
 -DGTEST_HAS_RTTI=0 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS
 -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_GLIBCXX_ASSERTIONS -D_HAS_EXCEPTIONS=0
 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS
 -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
 -IC:\b\slave\sanitizer-windows\build\stage1\tools\clang\lib\AST
 -IC:\b\slave\sanitizer-windows\llvm-project\clang\lib\AST
 -IC:\b\slave\sanitizer-windows\llvm-project\clang\include
 -IC:\b\slave\sanitizer-windows\build\stage1\tools\clang\include
 -IC:\b\slave\sanitizer-windows\build\stage1\include
 -IC:\b\slave\sanitizer-windows\llvm-project\llvm\include /DWIN32 /D_WINDOWS   /Zc:inline
 /Zc:preprocessor /Zc:__cplusplus /Zi /Oi /bigobj /permissive- /W4 -wd4141 -wd4146 -wd4244 -wd4267
 -wd4291 -wd4351 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4100 -wd4127 -wd4512
 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805
 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4709 -wd5105 -wd4324 -w14062 -we4238 /Gw /O2 /Ob2
 -std:c++17 -MD  /EHs-c- /GR- -UNDEBUG /showIncludes
 /Fotools\clang\lib\AST\CMakeFiles\obj.clangAST.dir\Interp\Interp.cpp.obj
 /Fdtools\clang\lib\AST\CMakeFiles\obj.clangAST.dir\ /FS -c
 C:\b\slave\sanitizer-windows\llvm-project\clang\lib\AST\Interp\Interp.cpp

It's probably always slow, but if a lot of compilation, command working in background taking >1200 sec, but ninja output is not stale

however if build is small incremental, this compiler invocation is bottle neck and we hit silent 1200

vitalybuka commented 3 months ago

@rnk

rnk commented 3 months ago

I fixed my MSVC build and reproduced this locally. This massive loop over a switch seems like an obvious compile time sink: https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/Interp/Interp.cpp#L958

rnk commented 3 months ago

This requires domain expertise. I ran with various MSVC flags, and the main Interp function takes 240s to compile locally for me. Presumably it takes much longer on GCP VMs (1200s)

@tbaederr , please take a look and see if you can optimize the compile time for this file. I assume it is slow with clang as well. Anything that makes it fast (moving non-dependent code out of templates, like error diagnostics) for non-MSVC compilers will probably help MSVC.

tbaederr commented 3 months ago

Is there anything specific I should keep an eye on? That switch will presumably only grow in size.... Most of the templated functions called in the switch are rather small, would a noinline attribute help for them?

llvmbot commented 3 months ago

@llvm/issue-subscribers-clang-frontend

Author: Vitaly Buka (vitalybuka)

https://lab.llvm.org/buildbot/#/builders/sanitizer-windows ``` command timed out: 1200 seconds without output running ['python', '../llvm-zorg/zorg/buildbot/builders/annotated/sanitizer-windows.py', '--jobs=16'], attempting to kill ``` From the bot: ``` C:\PROGRA~2\MIB055~1\2019\PROFES~1\VC\Tools\MSVC\1429~1.301\bin\Hostx64\x64\cl.exe /nologo /TP -DGTEST_HAS_RTTI=0 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_GLIBCXX_ASSERTIONS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -IC:\b\slave\sanitizer-windows\build\stage1\tools\clang\lib\AST -IC:\b\slave\sanitizer-windows\llvm-project\clang\lib\AST -IC:\b\slave\sanitizer-windows\llvm-project\clang\include -IC:\b\slave\sanitizer-windows\build\stage1\tools\clang\include -IC:\b\slave\sanitizer-windows\build\stage1\include -IC:\b\slave\sanitizer-windows\llvm-project\llvm\include /DWIN32 /D_WINDOWS /Zc:inline /Zc:preprocessor /Zc:__cplusplus /Zi /Oi /bigobj /permissive- /W4 -wd4141 -wd4146 -wd4244 -wd4267 -wd4291 -wd4351 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4709 -wd5105 -wd4324 -w14062 -we4238 /Gw /O2 /Ob2 -std:c++17 -MD /EHs-c- /GR- -UNDEBUG /showIncludes /Fotools\clang\lib\AST\CMakeFiles\obj.clangAST.dir\Interp\Interp.cpp.obj /Fdtools\clang\lib\AST\CMakeFiles\obj.clangAST.dir\ /FS -c C:\b\slave\sanitizer-windows\llvm-project\clang\lib\AST\Interp\Interp.cpp ``` It's probably always slow, but if a lot of compilation, command working in background taking >1200 sec, but ninja output is not stale however if build is small incremental, this compiler invocation is bottle neck and we hit silent 1200
rnk commented 3 months ago

I would look at the clang -ftime-trace profile and try to optimize that rather than messing around with MSVC. Look for expensive templates, consider how to reengineer the code to do more specialization through inlining rather than stamping out templates through instantiation. For example, you have all these helpers templated on bitwidth, etc. Do those really need to be templates? Can they take non-type template parameters as regular arguments? That will surely compile faster, and if you mark them always_inline, should result in identical performance, with lower compile time and less debug info.

philnik777 commented 3 months ago

FWIW I took a look at -ftime-trace and on my system the whole thing takes ~20s, though I haven't made sure the exact same arguments are used. 15 of those 20s were spent in the optimizer, and ~2.5s were parsing headers. Just ~1.5s was instantiating functions. Based on this I'm not sure there is much to improve other than to split things up into multiple files. This also seems to be a case where MSVC is incredibly slow given that it took 240s for you with MSVC (although I of course don't know the hardware you're running, 12x as long seems a bit excessive).

rnk commented 3 months ago

20s seems like something worth improving. Those 15s in the optimizer are actionable: those come from excessive templating.

It's not my code, but if you would prefer to ignore this at the cost of some ifdefs and pragmas, you can do:

#ifdef _MSC_VER
#pragma optimize("", off)
#endif
...Interp
#ifdef _MSC_VER
#pragma optimize("", on)
#endif
bd1976bris commented 2 months ago

Noting the analysis here: https://developercommunity.visualstudio.com/t/ON2-in-SparseBitVectorBase-when-com/10657991

bd1976bris commented 1 month ago

Pull request for work-around: https://github.com/llvm/llvm-project/pull/110986.