llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.03k stars 11.97k forks source link

`misc-confusable-identifiers` causing performance regression with clang-tidy-15 #57527

Closed firewave closed 1 year ago

firewave commented 2 years ago

Using clang-tidy-14 the step with the analysis in the GitHub Action of our project took about 8 minutes. Switching to clang-tidy-15 it now takes at least 13 minutes. run-clang-tidy-* is being used with -j2 (actually nproc but that resolves to 2). I checked several builds to make sure it wasn't just an odd-one-out slow runner.

We only recently did the switch and the first version we were using is 15.0.0~++20220825073216+12f27d8bef93-1~exp1~20220825073228.37. The latest build which still experiences this used 15.0.0~++20220902063112+11ba13a62506-1~exp1~20220902063219.49. The last version used of the prior version was 14.0.6~++20220816122211+f28c006a5895-1~exp1~20220816122246.108.

We are using the packages from https://apt.llvm.org on ubuntu 22.04. The project in question in https://github.com/danmar/cppcheck.

The .clang-tidy configuration is the same for both (obviously clang-tidy-15 comes with additional/improved checks):

---
Checks:          '*,-abseil-*,-altera-*,-android-*,-boost-*,-cert-*,-cppcoreguidelines-*,-darwin-*,-fuchsia-*,-google-*,-hicpp-*,-linuxkernel-*,-llvm-*,-llvmlibc-*,-mpi-*,-objc-*,-openmp-*,-zircon-*,-readability-braces-around-statements,-readability-magic-numbers,-bugprone-macro-parentheses,-readability-isolate-declaration,-readability-function-size,-modernize-use-trailing-return-type,-readability-implicit-bool-conversion,-readability-uppercase-literal-suffix,-modernize-use-auto,-readability-else-after-return,-modernize-use-default-member-init,-readability-named-parameter,-readability-redundant-member-init,-performance-faster-string-find,-modernize-avoid-c-arrays,-modernize-use-equals-default,-readability-container-size-empty,-readability-simplify-boolean-expr,-bugprone-branch-clone,-bugprone-narrowing-conversions,-modernize-raw-string-literal,-readability-convert-member-functions-to-static,-modernize-loop-convert,-readability-const-return-type,-performance-unnecessary-value-param,-modernize-return-braced-init-list,-performance-inefficient-string-concatenation,-misc-throw-by-value-catch-by-reference,-readability-avoid-const-params-in-decls,-readability-non-const-parameter,-misc-non-private-member-variables-in-classes,-bugprone-suspicious-string-compare,-clang-analyzer-*,-bugprone-signed-char-misuse,-readability-make-member-function-const,-misc-no-recursion,-readability-use-anyofallof,-performance-no-automatic-move,-bugprone-suspicious-include,-modernize-replace-random-shuffle,-readability-function-cognitive-complexity,-readability-redundant-access-specifiers,-performance-noexcept-move-constructor,-concurrency-mt-unsafe,-bugprone-easily-swappable-parameters,-readability-suspicious-call-argument,-readability-identifier-length,-readability-container-data-pointer,-bugprone-assignment-in-if-condition,-misc-const-correctness'
WarningsAsErrors: '*'
CheckOptions:
  - key:             misc-non-private-member-variables-in-classes.IgnoreClassesWithAllMemberVariablesBeingPublic
    value:           '1'

We are also leveraging it to report compiler warnings. I don't have the general list of parameters handy right now. There's a few slightly adjustments per file (via CMake) but it is about the same for most of the files.

I know there is some option to collect the timing information for each analysis but it is not documented and I no longer remembered where I read about it. If you point me to it I will happily provide that data.

llvmbot commented 2 years ago

@llvm/issue-subscribers-clang-tidy

whisperity commented 2 years ago

The profiling information can be gathered by giving tidy --enable-check-profile and --store-check-profile, it should be documented in Clang-Tidy's command-line help. (You might have to use --help-hidden, though?)

firewave commented 2 years ago

The profiling information can be gathered by giving tidy --enable-check-profile and --store-check-profile, it should be documented in Clang-Tidy's command-line help. (You might have to use --help-hidden, though?)

Thanks.

                                             --store-check-profile=<prefix>           -
                                             By default reports are printed in tabulated
                                             format to stderr. When this option is passed,
                                             these per-TU profiles are instead stored as JSON.

I don't understand the help text though. It sounds like it is supposed to be a boolean instead and there's no indication how prefix is actually used. I guess it's a filename prefix...!?

What kind of output should I provide? The regular one or the JSON one (however I do that)? Also is the JSON output in the Google profiler format (do I recall that correctly?) which can be visualized with Chrome?

firewave commented 2 years ago

You cannot pass additional clang-tidy parameters to run-clang-tidy so I need to script something that iterates all the files first 😒

firewave commented 2 years ago

I don't understand the help text though. It sounds like it is supposed to be a boolean instead and there's no indication how prefix is actually used. I guess it's a filename prefix...!?

It actually creates a folder named after the specified prefix where it places files named <datetime>-<srcfile>.json. And the files do not seem to work with chrome://tracing.

firewave commented 2 years ago

I am generating files for both versions now.

It's really hard to make something out by simply comparing them. It would be helpful if there were a key with the total time as well so it would be easier to identify files to drill into.

njames93 commented 2 years ago

I'd be willing to wager it's misc-confusable-identifiers check. That seems to have a high complexity

firewave commented 2 years ago

I'd be willing to wager it's misc-confusable-identifiers check. That seems to have a high complexity

Taking a quick peek it doesn't seem to be an outlier. Seems like several checks got a bit slower and there's several new ones on top of it - so it just adds up since it is well over 100 checks being executed.

firewave commented 2 years ago

Here's the output. I generated it locally on ubuntu 20.04 with the following versions: 14.0.6-++20220827082222+f28c006a5895-1~exp1~20220827202233.158 15.0.0-++20220901113055+80b4a25d7a21-1~exp1~20220901113143.45

tidy14.zip tidy15.zip

I have no idea why those two files only have two checks applied. Also there's some files which are duplicated in the compilation database (they are Qt files which de-duplicated with qmake but not in the CMake build). Also some additional duplicates since I couldn't use run-clang-tidy.

FYI precompiled headers are being used. I am not sure if those make things faster or slower - something to still look into.

firewave commented 2 years ago

For completeness: compile_commands.zip

whisperity commented 2 years ago

if there were a key with the total time as well

<checker-name>.wall should be the total time spent inside each check's code, I think. I've actually patched the CSA Testbench to support running and collecting these data into graphs. Tomorrow (when I'm at work and can spare some CPU cycles) I shall get back to you with visualisation; the raw JSON files you've provided shall be enough to quickly skim what the issue is. 🙂

(Setting up CSA-Testbench, and CodeChecker underneath might be worth it just to have recurring analyses done quickly as it does precisely the "iterate all the files" for analysis, and CSA-Testbench iterates the results. Might be worth a long-term investment for the project, but for a single run of visualisation it should be alright without.)

FYI precompiled headers are being used.

As long as they are ON for both analyses, the results should still be consistent. Clang-Tidy enters the scene after the AST had been built.

firewave commented 2 years ago

Tomorrow (when I'm at work and can spare some CPU cycles) I shall get back to you with visualisation; the raw JSON files you've provided shall be enough to quickly skim what the issue is. 🙂

Thanks. That's very appreciated.

whisperity commented 2 years ago

I've done it.

Visualisation of the distribution of time

Raw data (needed a bit of manual Python tinkering):


This is the .wall time as measured by you, summed for the entire data set (GROUP BY checker), ordered from slowest to quickest (first by 15.0.0, then by 14.0.6). A positive difference means the check got slower, a negative difference means the check got quicker. (At least according to the one measurement time series that was provided.) NaN means the check was not available in that configuration.

Checker name Wall time (14.0.6) Wall time (15.0.0) Difference
misc-confusable-identifiers nan 2092.5705919265747 nan
bugprone-reserved-identifier 1105.5387988090515 1190.51025223732 84.97145342826843
readability-identifier-naming 649.087854385376 693.3841750621796 44.29632067680359
bugprone-use-after-move 448.9085593223572 445.57042050361633 -3.3381388187408447
modernize-macro-to-enum nan 397.59349060058594 nan
misc-unused-using-decls 349.55360078811646 366.3567907810211 16.803189992904663
modernize-replace-auto-ptr 243.39931750297546 256.1724271774292 12.773109674453735
bugprone-infinite-loop 214.78071761131287 247.5740029811859 32.79328536987305
bugprone-unused-return-value 250.39787912368774 245.4199357032776 -4.977943420410156
readability-redundant-control-flow 243.63033843040466 236.88260006904602 -6.747738361358643
performance-move-const-arg 233.11965107917786 232.4953737258911 -0.6242773532867432
modernize-deprecated-ios-base-aliases 219.1269552707672 228.2364785671234 9.109523296356201
modernize-use-nullptr 229.16736435890198 226.54843378067017 -2.6189305782318115
portability-std-allocator-const nan 216.6481773853302 nan
bugprone-suspicious-semicolon 215.26551723480225 215.58539962768555 0.3198823928833008
bugprone-assert-side-effect 213.31210327148438 214.64832162857056 1.3362183570861816
bugprone-unused-raii 211.51127529144287 209.2420470714569 -2.269228219985962
bugprone-multiple-statement-macro 202.94630575180054 205.20749139785767 2.261185646057129
misc-definitions-in-headers 204.20978689193726 202.01779961585999 -2.1919872760772705
bugprone-sizeof-expression 201.9040904045105 201.43438029289246 -0.469710111618042
misc-misleading-identifier 196.37127041816711 197.48825478553772 1.1169843673706055
readability-redundant-declaration 196.58375239372253 195.89643502235413 -0.6873173713684082
misc-non-copyable-objects 178.61053586006165 179.85964846611023 1.249112606048584
readability-static-definition-in-anonymous-namespace 167.35302209854126 169.34562373161316 1.9926016330718994
bugprone-unchecked-optional-access nan 153.33418107032776 nan
modernize-redundant-void-arg 131.5804464817047 129.23538780212402 -2.3450586795806885
bugprone-dangling-handle 128.11392283439636 125.6782054901123 -2.4357173442840576
modernize-use-noexcept 112.16756844520569 113.06942224502563 0.9018537998199463
bugprone-implicit-widening-of-multiplication-result 110.7639832496643 112.74309849739075 1.9791152477264404
bugprone-misplaced-widening-cast 105.2126817703247 103.84422969818115 -1.3684520721435547
modernize-use-using 285.46957659721375 98.21715474128723 -187.2524218559265
misc-misplaced-const 93.62663340568542 93.18858909606934 -0.43804430961608887
bugprone-not-null-terminated-result 91.3642246723175 88.86984968185425 -2.494374990463257
misc-unconventional-assign-operator 84.38264012336731 84.00267004966736 -0.37997007369995117
misc-redundant-expression 82.14897084236145 80.15450811386108 -1.9944627285003662
bugprone-exception-escape 76.47825288772583 75.3745448589325 -1.103708028793335
portability-simd-intrinsics 77.09666419029236 72.45522594451904 -4.641438245773315
performance-unnecessary-copy-initialization 72.4450831413269 71.28880858421326 -1.1562745571136475
readability-redundant-string-init 73.14966583251953 69.46033692359924 -3.689328908920288
modernize-use-bool-literals 62.96555733680725 63.45644950866699 0.4908921718597412
performance-type-promotion-in-math-fn 66.2978663444519 62.253777503967285 -4.044088840484619
misc-unused-parameters 50.20702600479126 49.089574575424194 -1.1174514293670654
bugprone-virtual-near-miss 49.66301417350769 49.08682179450989 -0.5761923789978027
readability-redundant-smartptr-get 49.93127202987671 48.33027362823486 -1.6009984016418457
misc-new-delete-overloads 47.00721478462219 46.27233147621155 -0.7348833084106445
readability-string-compare 41.64035725593567 42.170042991638184 0.5296857357025146
readability-inconsistent-declaration-parameter-name 40.55525732040405 41.19813013076782 0.6428728103637695
bugprone-unhandled-self-assignment 40.470417499542236 39.81131911277771 -0.6590983867645264
bugprone-suspicious-memset-usage 41.33718657493591 38.39648485183716 -2.940701723098755
bugprone-fold-init-type 38.4579918384552 37.239190101623535 -1.218801736831665
modernize-use-equals-delete 37.49043345451355 36.686710357666016 -0.8037230968475342
bugprone-argument-comment 34.95252442359924 34.585686445236206 -0.3668379783630371
modernize-use-override 33.44572639465332 33.29408597946167 -0.1516404151916504
bugprone-misplaced-operator-in-strlen-in-alloc 34.49414038658142 33.23880743980408 -1.2553329467773438
bugprone-incorrect-roundings 32.655763387680054 32.253759145736694 -0.4020042419433594
readability-misleading-indentation 32.11695981025696 31.375541925430298 -0.7414178848266602
bugprone-move-forwarding-reference 33.13937258720398 31.21400761604309 -1.9253649711608887
performance-no-int-to-ptr 30.66062879562378 30.825817108154297 0.16518831253051758
bugprone-swapped-arguments 29.708866119384766 30.76939630508423 1.060530185699463
misc-static-assert 29.42076826095581 29.839959144592285 0.4191908836364746
bugprone-undefined-memory-manipulation 30.146130800247192 29.130253076553345 -1.0158777236938477
misc-unused-alias-decls 25.080172300338745 27.755188703536987 2.675016403198242
performance-inefficient-algorithm 28.198129892349243 26.684087991714478 -1.5140419006347656
bugprone-bad-signal-to-kill-thread 27.028095245361328 25.623483180999756 -1.4046120643615723
bugprone-suspicious-memory-comparison 26.547013759613037 25.50505304336548 -1.0419607162475586
concurrency-thread-canceltype-asynchronous 26.263588666915894 25.136251211166382 -1.1273374557495117
bugprone-forward-declaration-namespace 25.9287588596344 25.008864164352417 -0.9198946952819824
readability-redundant-string-cstr 23.537427186965942 22.448256492614746 -1.0891706943511963
bugprone-undelegated-constructor 20.048752546310425 18.956682205200195 -1.0920703411102295
readability-qualified-auto 15.37836241722107 15.080836534500122 -0.29752588272094727
modernize-use-emplace 4.773821592330933 14.527168273925781 9.753346681594849
bugprone-suspicious-enum-usage 14.164887189865112 13.146995306015015 -1.0178918838500977
bugprone-spuriously-wake-up-functions 13.156095027923584 12.736322402954102 -0.4197726249694824
readability-static-accessed-through-instance 12.926794290542603 12.513283491134644 -0.413510799407959
bugprone-posix-return 11.90090799331665 11.340661764144897 -0.5602462291717529
bugprone-string-literal-with-embedded-nul 10.703440427780151 10.592498779296875 -0.11094164848327637
bugprone-misplaced-pointer-arithmetic-in-alloc 10.223212718963623 10.165222644805908 -0.057990074157714844
bugprone-parent-virtual-call 9.846845388412476 9.678703784942627 -0.16814160346984863
bugprone-integer-division 9.215988874435425 8.984556436538696 -0.23143243789672852
bugprone-forwarding-reference-overload 9.3135404586792 8.964130878448486 -0.3494095802307129
modernize-make-shared 9.045833110809326 8.497532844543457 -0.5483002662658691
modernize-pass-by-value 8.941504716873169 8.305288553237915 -0.6362161636352539
performance-move-constructor-init 7.401392698287964 7.2588934898376465 -0.14249920845031738
bugprone-bool-pointer-implicit-conversion 7.345062494277954 6.850750207901001 -0.4943122863769531
bugprone-copy-constructor-init 6.975167274475098 6.750770092010498 -0.2243971824645996
bugprone-string-constructor 6.7134950160980225 6.646806478500366 -0.06668853759765625
readability-redundant-function-ptr-dereference 6.54532527923584 6.5386621952056885 -0.006663084030151367
bugprone-inaccurate-erase 6.466278314590454 6.080984115600586 -0.38529419898986816
bugprone-throw-keyword-missing 5.814808130264282 5.7655863761901855 -0.04922175407409668
modernize-deprecated-headers nan 5.073593854904175 nan
readability-delete-null-pointer 5.1911232471466064 4.873579978942871 -0.31754326820373535
modernize-shrink-to-fit 4.877164125442505 4.6274073123931885 -0.2497568130493164
bugprone-string-integer-assignment 4.726957559585571 4.530395984649658 -0.19656157493591309
misc-uniqueptr-reset-release 4.664436340332031 4.451381206512451 -0.21305513381958008
bugprone-redundant-branch-condition 4.465813159942627 4.185862302780151 -0.2799508571624756
bugprone-shared-ptr-array-mismatch nan 3.341019630432129 nan
bugprone-sizeof-container 2.260026454925537 2.22857928276062 -0.03144717216491699
misc-misleading-bidirectional 2.081869125366211 2.1336023807525635 0.05173325538635254
performance-trivially-destructible 1.301093339920044 1.281991958618164 -0.019101381301879883
readability-simplify-subscript-expr 0.4408137798309326 0.41901254653930664 -0.021801233291625977
bugprone-too-small-loop-variable 0.4015200138092041 0.390531063079834 -0.010988950729370117
readability-misplaced-array-index 0.40004992485046387 0.37462735176086426 -0.02542257308959961
performance-inefficient-vector-operation 0.3395109176635742 0.34320592880249023 0.0036950111389160156
bugprone-suspicious-missing-comma 0.31686949729919434 0.3038761615753174 -0.012993335723876953
readability-uniqueptr-delete-release 0.2931196689605713 0.29685425758361816 0.003734588623046875
bugprone-unhandled-exception-at-new 0.22762751579284668 0.2381892204284668 0.010561704635620117
performance-for-range-copy 0.2098989486694336 0.16357016563415527 -0.04632878303527832
bugprone-terminating-continue 0.15036296844482422 0.1605687141418457 0.010205745697021484
performance-implicit-conversion-in-loop 0.03404355049133301 0.03122091293334961 -0.0028226375579833984
bugprone-lambda-function-name 0.01359701156616211 0.012044429779052734 -0.001552581787109375
firewave commented 2 years ago

So the major regressions are (in descending order):

@njames93 was right though. misc-confusable-identifiers is the main offender but the regressions also add up.

Some observations:

whisperity commented 2 years ago

bugprone-unchecked-optional-access is also special in that it is using the new data-flow framework, so it's not a "conventional" Clang-Tidy check.

It seems base:: is something in Chromium. To me, what is surprising that the check isn't checking for Boost.Optional :astonished:

The problem with having an abseil-* alias is that aliases are problematic when they are supposed to run with different configurations, and that the groups of checks is supposed to categorise the guideline or the warning kind itself (the rule), not the specific implementation. Not accessing an optional with the UB accessor when it's empty isn't an Abseil-specific rule.

firewave commented 2 years ago

It seems base:: is something in Chromium. To me, what is surprising that the check isn't checking for Boost.Optional 😲

Or more obviously llvm::Optional.

firewave commented 2 years ago

I changed the title so this focuses on the actual issue and filed separate issues for the other things I noted.

How does misc-confusable-identifiers compare to -Wbidi-chars= from GCC?

If they are identical and since GCC enables -Wbidi-chars=unpaired by default you could make that an error via -Werror=bidi-chars and disable it in clang-tidy. No point in running essentially the same heuristic/checks twice.

njames93 commented 2 years ago

One glaring issue that needs addressing is most of the time spent in bugprone-reserved-identifier and readability-identifier-naming is computing the same thing, as both checks inherit from the same base which(redundantly) does the same matching logic. Theres other cases like this, with say the IncludeInserter which is duplicated runtime paths for all checks which use one. I have a vague design idea in my but I haven't got round to getting a design document sorted for it.

PiotrZSL commented 1 year ago

Optimizations were delivered in 2a84c635f2a1dcb4546a5d751a32eac24103c7e6, 8fdedcd1a242f6b54eec969e72e35ac0a68b7ea1, 1c282052624f9d0bd273bde0b47b30c96699c6c7.

PiotrZSL commented 1 year ago

As for other checks, most problems are in those "libraries", and should be tracked under separate issue on Clang 16/17 if occur.