llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.16k stars 12.03k forks source link

[clang-tidy] Performance regression when enabling c++20 #62337

Open tiagomacarios opened 1 year ago

tiagomacarios commented 1 year ago

We are seeing some large performance regressions in Clang-Tidy when comparing C++17 and C++20. Some translation units go from 8 seconds to 6 minutes.

Profiling is pointing to clang-tidy.exe!clang::tooling::ExpandModularHeadersPPCallbacks::parseToLocation as the culprit. Which was added with https://reviews.llvm.org/D59528.

Is there a way to make that machinery an opt-in/out?

@alexfh @gribozavr

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-tidy

PiotrZSL commented 1 year ago

Could you paste here output from --enable-check-profile with both C++17 and C++20.

tiagomacarios commented 1 year ago

@PiotrZSL looking at https://reviews.llvm.org/D5911 I don't see how --enable-check-profile would help. The regression is before the checks even run - it happens while parsing the AST. Example:

Function Name
 + clang-tidy (PID: 63836)
| + RtlUserThreadStart
|| + BaseThreadInitThunk
||| + __scrt_common_main_seh
|||| + clang::tidy::clangTidyMain
||||| + clang::tidy::runClangTidy
|||||| + clang::tooling::ClangTool::run
||||||| + clang::tooling::ToolInvocation::run
|||||||| + clang::tooling::ToolInvocation::runInvocation
||||||||| + `clang::tidy::runClangTidy'::`2'::ActionFactory::runInvocation
|||||||||| + clang::tooling::FrontendActionFactory::runInvocation
||||||||||| + clang::CompilerInstance::ExecuteAction
|||||||||||| + clang::FrontendAction::Execute
||||||||||||| + clang::ASTFrontendAction::ExecuteAction
|||||||||||||| + clang::ParseAST
||||||||||||||| - clang::Parser::ParseTopLevelDecl
||||||||||||||| + clang::Preprocessor::Lex
|||||||||||||||| + clang::Lexer::LexTokenInternal
||||||||||||||||| + clang::Preprocessor::HandleDirective
|||||||||||||||||| + clang::Preprocessor::HandlePragmaDirective
||||||||||||||||||| + clang::tooling::ExpandModularHeadersPPCallbacks::parseToLocation

Anyway here are numbers for a random translation unit with one of our custom checkers C++17 - total runtime 20 seconds ===-------------------------------------------------------------------------=== clang-tidy checks profiling ===-------------------------------------------------------------------------=== Total Execution Time: 5.9375 seconds (5.7330 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 5.8906 (100.0%) 0.0469 (100.0%) 5.9375 (100.0%) 5.7330 (100.0%) office-simplify-logical-expressions 5.8906 (100.0%) 0.0469 (100.0%) 5.9375 (100.0%) 5.7330 (100.0%) Total

C++20 - total runtime 95 seconds ===-------------------------------------------------------------------------=== clang-tidy checks profiling ===-------------------------------------------------------------------------=== Total Execution Time: 6.3438 seconds (6.3660 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 6.1875 (100.0%) 0.1562 (100.0%) 6.3438 (100.0%) 6.3660 (100.0%) office-simplify-logical-expressions 6.1875 (100.0%) 0.1562 (100.0%) 6.3438 (100.0%) 6.3660 (100.0%) Total

PiotrZSL commented 1 year ago

Do we know if problem is with getLoadedSLocEntry, recordFileContent, or Lex ?

Issues that I see:

Do you have some reproduction example ? Something that could be profiled. Other issue that can be here, is that simply with C++20 you using some modules (or more heavy headers) and that trigger this part of code.

tiagomacarios commented 1 year ago

We don't use modules, but we do have some "heavy" headers.

For one on our TUs I see that parseToLocation is almost 50% of the compile time:

image

Most of the calls are while trying to handle a pragma:

image

A lot of cycles are being spent on the last loop in that function:

image

Is there a way to provide an opt-out of these callbacks? This is holding us from moving to c++20.

PiotrZSL commented 1 year ago

Looks like this this is enabled when modules are enabled (when it should also check if modules are in use, somehow). If you disable modules with -fno-modules, then this shouldn't be executed.

tiagomacarios commented 1 year ago

-fno-modules is a no-op if c++20 is enabled https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/Clang.cpp#L3670-L3689

We use precompiled headers. If I am understanding it right - the new logic in tidy will re-parse all the headers again. Perhaps we should add more logic to treat "old" precomp and modules differently. Nevertheless the new logic will incur significant slow down to projects with large header inclusion trees.

alexolog commented 1 year ago

We do not use modules or precompiled headers, but we do use some C++20 features. Compiling our whole project would take a minute or two without clang-tidy, half an hour or so with clang-tidy 15, and now I am trying clang-tidy 16 and I killed it after 12 hours or so.

Is there a way to avoid this unacceptable slowdown?