`misc-confusable-identifiers` causing performance regression with clang-tidy-15

firewave commented 2 years ago

Using clang-tidy-14 the step with the analysis in the GitHub Action of our project took about 8 minutes. Switching to clang-tidy-15 it now takes at least 13 minutes. run-clang-tidy-* is being used with -j2 (actually nproc but that resolves to 2). I checked several builds to make sure it wasn't just an odd-one-out slow runner.

We only recently did the switch and the first version we were using is 15.0.0~++20220825073216+12f27d8bef93-1~exp1~20220825073228.37. The latest build which still experiences this used 15.0.0~++20220902063112+11ba13a62506-1~exp1~20220902063219.49. The last version used of the prior version was 14.0.6~++20220816122211+f28c006a5895-1~exp1~20220816122246.108.

We are using the packages from https://apt.llvm.org on ubuntu 22.04. The project in question in https://github.com/danmar/cppcheck.

The .clang-tidy configuration is the same for both (obviously clang-tidy-15 comes with additional/improved checks):

---
Checks:          '*,-abseil-*,-altera-*,-android-*,-boost-*,-cert-*,-cppcoreguidelines-*,-darwin-*,-fuchsia-*,-google-*,-hicpp-*,-linuxkernel-*,-llvm-*,-llvmlibc-*,-mpi-*,-objc-*,-openmp-*,-zircon-*,-readability-braces-around-statements,-readability-magic-numbers,-bugprone-macro-parentheses,-readability-isolate-declaration,-readability-function-size,-modernize-use-trailing-return-type,-readability-implicit-bool-conversion,-readability-uppercase-literal-suffix,-modernize-use-auto,-readability-else-after-return,-modernize-use-default-member-init,-readability-named-parameter,-readability-redundant-member-init,-performance-faster-string-find,-modernize-avoid-c-arrays,-modernize-use-equals-default,-readability-container-size-empty,-readability-simplify-boolean-expr,-bugprone-branch-clone,-bugprone-narrowing-conversions,-modernize-raw-string-literal,-readability-convert-member-functions-to-static,-modernize-loop-convert,-readability-const-return-type,-performance-unnecessary-value-param,-modernize-return-braced-init-list,-performance-inefficient-string-concatenation,-misc-throw-by-value-catch-by-reference,-readability-avoid-const-params-in-decls,-readability-non-const-parameter,-misc-non-private-member-variables-in-classes,-bugprone-suspicious-string-compare,-clang-analyzer-*,-bugprone-signed-char-misuse,-readability-make-member-function-const,-misc-no-recursion,-readability-use-anyofallof,-performance-no-automatic-move,-bugprone-suspicious-include,-modernize-replace-random-shuffle,-readability-function-cognitive-complexity,-readability-redundant-access-specifiers,-performance-noexcept-move-constructor,-concurrency-mt-unsafe,-bugprone-easily-swappable-parameters,-readability-suspicious-call-argument,-readability-identifier-length,-readability-container-data-pointer,-bugprone-assignment-in-if-condition,-misc-const-correctness'
WarningsAsErrors: '*'
CheckOptions:
  - key:             misc-non-private-member-variables-in-classes.IgnoreClassesWithAllMemberVariablesBeingPublic
    value:           '1'

We are also leveraging it to report compiler warnings. I don't have the general list of parameters handy right now. There's a few slightly adjustments per file (via CMake) but it is about the same for most of the files.

I know there is some option to collect the timing information for each analysis but it is not documented and I no longer remembered where I read about it. If you point me to it I will happily provide that data.

llvmbot commented 2 years ago

@llvm/issue-subscribers-clang-tidy

whisperity commented 2 years ago

The profiling information can be gathered by giving tidy --enable-check-profile and --store-check-profile, it should be documented in Clang-Tidy's command-line help. (You might have to use --help-hidden, though?)

firewave commented 2 years ago

The profiling information can be gathered by giving tidy --enable-check-profile and --store-check-profile, it should be documented in Clang-Tidy's command-line help. (You might have to use --help-hidden, though?)

Thanks.

                                             --store-check-profile=<prefix>           -
                                             By default reports are printed in tabulated
                                             format to stderr. When this option is passed,
                                             these per-TU profiles are instead stored as JSON.

I don't understand the help text though. It sounds like it is supposed to be a boolean instead and there's no indication how prefix is actually used. I guess it's a filename prefix...!?

What kind of output should I provide? The regular one or the JSON one (however I do that)? Also is the JSON output in the Google profiler format (do I recall that correctly?) which can be visualized with Chrome?

firewave commented 2 years ago

You cannot pass additional clang-tidy parameters to run-clang-tidy so I need to script something that iterates all the files first 😒

firewave commented 2 years ago

I don't understand the help text though. It sounds like it is supposed to be a boolean instead and there's no indication how prefix is actually used. I guess it's a filename prefix...!?

It actually creates a folder named after the specified prefix where it places files named <datetime>-<srcfile>.json. And the files do not seem to work with chrome://tracing.

firewave commented 2 years ago

I am generating files for both versions now.

It's really hard to make something out by simply comparing them. It would be helpful if there were a key with the total time as well so it would be easier to identify files to drill into.

njames93 commented 2 years ago

I'd be willing to wager it's misc-confusable-identifiers check. That seems to have a high complexity

firewave commented 2 years ago

I'd be willing to wager it's misc-confusable-identifiers check. That seems to have a high complexity

Taking a quick peek it doesn't seem to be an outlier. Seems like several checks got a bit slower and there's several new ones on top of it - so it just adds up since it is well over 100 checks being executed.

firewave commented 2 years ago

Here's the output. I generated it locally on ubuntu 20.04 with the following versions: 14.0.6-++20220827082222+f28c006a5895-1~exp1~20220827202233.158 15.0.0-++20220901113055+80b4a25d7a21-1~exp1~20220901113143.45

tidy14.zip tidy15.zip

I have no idea why those two files only have two checks applied. Also there's some files which are duplicated in the compilation database (they are Qt files which de-duplicated with qmake but not in the CMake build). Also some additional duplicates since I couldn't use run-clang-tidy.

FYI precompiled headers are being used. I am not sure if those make things faster or slower - something to still look into.

firewave commented 2 years ago

For completeness: compile_commands.zip

whisperity commented 2 years ago

if there were a key with the total time as well

<checker-name>.wall should be the total time spent inside each check's code, I think. I've actually patched the CSA Testbench to support running and collecting these data into graphs. Tomorrow (when I'm at work and can spare some CPU cycles) I shall get back to you with visualisation; the raw JSON files you've provided shall be enough to quickly skim what the issue is. 🙂

(Setting up CSA-Testbench, and CodeChecker underneath might be worth it just to have recurring analyses done quickly as it does precisely the "iterate all the files" for analysis, and CSA-Testbench iterates the results. Might be worth a long-term investment for the project, but for a single run of visualisation it should be alright without.)

FYI precompiled headers are being used.

As long as they are ON for both analyses, the results should still be consistent. Clang-Tidy enters the scene after the AST had been built.

firewave commented 2 years ago

Tomorrow (when I'm at work and can spare some CPU cycles) I shall get back to you with visualisation; the raw JSON files you've provided shall be enough to quickly skim what the issue is. 🙂

Thanks. That's very appreciated.

whisperity commented 2 years ago

I've done it.

Visualisation of the distribution of time

Raw data (needed a bit of manual Python tinkering):

This is the .wall time as measured by you, summed for the entire data set (GROUP BY checker), ordered from slowest to quickest (first by 15.0.0, then by 14.0.6). A positive difference means the check got slower, a negative difference means the check got quicker. (At least according to the one measurement time series that was provided.) NaN means the check was not available in that configuration.

Checker name	Wall time (14.0.6)	Wall time (15.0.0)	Difference
misc-confusable-identifiers	nan	2092.5705919265747	nan
bugprone-reserved-identifier	1105.5387988090515	1190.51025223732	84.97145342826843
readability-identifier-naming	649.087854385376	693.3841750621796	44.29632067680359
bugprone-use-after-move	448.9085593223572	445.57042050361633	-3.3381388187408447
modernize-macro-to-enum	nan	397.59349060058594	nan
misc-unused-using-decls	349.55360078811646	366.3567907810211	16.803189992904663
modernize-replace-auto-ptr	243.39931750297546	256.1724271774292	12.773109674453735
bugprone-infinite-loop	214.78071761131287	247.5740029811859	32.79328536987305
bugprone-unused-return-value	250.39787912368774	245.4199357032776	-4.977943420410156
readability-redundant-control-flow	243.63033843040466	236.88260006904602	-6.747738361358643
performance-move-const-arg	233.11965107917786	232.4953737258911	-0.6242773532867432
modernize-deprecated-ios-base-aliases	219.1269552707672	228.2364785671234	9.109523296356201
modernize-use-nullptr	229.16736435890198	226.54843378067017	-2.6189305782318115
portability-std-allocator-const	nan	216.6481773853302	nan
bugprone-suspicious-semicolon	215.26551723480225	215.58539962768555	0.3198823928833008
bugprone-assert-side-effect	213.31210327148438	214.64832162857056	1.3362183570861816
bugprone-unused-raii	211.51127529144287	209.2420470714569	-2.269228219985962
bugprone-multiple-statement-macro	202.94630575180054	205.20749139785767	2.261185646057129
misc-definitions-in-headers	204.20978689193726	202.01779961585999	-2.1919872760772705
bugprone-sizeof-expression	201.9040904045105	201.43438029289246	-0.469710111618042
misc-misleading-identifier	196.37127041816711	197.48825478553772	1.1169843673706055
readability-redundant-declaration	196.58375239372253	195.89643502235413	-0.6873173713684082
misc-non-copyable-objects	178.61053586006165	179.85964846611023	1.249112606048584
readability-static-definition-in-anonymous-namespace	167.35302209854126	169.34562373161316	1.9926016330718994
bugprone-unchecked-optional-access	nan	153.33418107032776	nan
modernize-redundant-void-arg	131.5804464817047	129.23538780212402	-2.3450586795806885
bugprone-dangling-handle	128.11392283439636	125.6782054901123	-2.4357173442840576
modernize-use-noexcept	112.16756844520569	113.06942224502563	0.9018537998199463
bugprone-implicit-widening-of-multiplication-result	110.7639832496643	112.74309849739075	1.9791152477264404
bugprone-misplaced-widening-cast	105.2126817703247	103.84422969818115	-1.3684520721435547
modernize-use-using	285.46957659721375	98.21715474128723	-187.2524218559265
misc-misplaced-const	93.62663340568542	93.18858909606934	-0.43804430961608887
bugprone-not-null-terminated-result	91.3642246723175	88.86984968185425	-2.494374990463257
misc-unconventional-assign-operator	84.38264012336731	84.00267004966736	-0.37997007369995117
misc-redundant-expression	82.14897084236145	80.15450811386108	-1.9944627285003662
bugprone-exception-escape	76.47825288772583	75.3745448589325	-1.103708028793335
portability-simd-intrinsics	77.09666419029236	72.45522594451904	-4.641438245773315
performance-unnecessary-copy-initialization	72.4450831413269	71.28880858421326	-1.1562745571136475
readability-redundant-string-init	73.14966583251953	69.46033692359924	-3.689328908920288
modernize-use-bool-literals	62.96555733680725	63.45644950866699	0.4908921718597412
performance-type-promotion-in-math-fn	66.2978663444519	62.253777503967285	-4.044088840484619
misc-unused-parameters	50.20702600479126	49.089574575424194	-1.1174514293670654
bugprone-virtual-near-miss	49.66301417350769	49.08682179450989	-0.5761923789978027
readability-redundant-smartptr-get	49.93127202987671	48.33027362823486	-1.6009984016418457
misc-new-delete-overloads	47.00721478462219	46.27233147621155	-0.7348833084106445
readability-string-compare	41.64035725593567	42.170042991638184	0.5296857357025146
readability-inconsistent-declaration-parameter-name	40.55525732040405	41.19813013076782	0.6428728103637695
bugprone-unhandled-self-assignment	40.470417499542236	39.81131911277771	-0.6590983867645264
bugprone-suspicious-memset-usage	41.33718657493591	38.39648485183716	-2.940701723098755
bugprone-fold-init-type	38.4579918384552	37.239190101623535	-1.218801736831665
modernize-use-equals-delete	37.49043345451355	36.686710357666016	-0.8037230968475342
bugprone-argument-comment	34.95252442359924	34.585686445236206	-0.3668379783630371
modernize-use-override	33.44572639465332	33.29408597946167	-0.1516404151916504
bugprone-misplaced-operator-in-strlen-in-alloc	34.49414038658142	33.23880743980408	-1.2553329467773438
bugprone-incorrect-roundings	32.655763387680054	32.253759145736694	-0.4020042419433594
readability-misleading-indentation	32.11695981025696	31.375541925430298	-0.7414178848266602
bugprone-move-forwarding-reference	33.13937258720398	31.21400761604309	-1.9253649711608887
performance-no-int-to-ptr	30.66062879562378	30.825817108154297	0.16518831253051758
bugprone-swapped-arguments	29.708866119384766	30.76939630508423	1.060530185699463
misc-static-assert	29.42076826095581	29.839959144592285	0.4191908836364746
bugprone-undefined-memory-manipulation	30.146130800247192	29.130253076553345	-1.0158777236938477
misc-unused-alias-decls	25.080172300338745	27.755188703536987	2.675016403198242
performance-inefficient-algorithm	28.198129892349243	26.684087991714478	-1.5140419006347656
bugprone-bad-signal-to-kill-thread	27.028095245361328	25.623483180999756	-1.4046120643615723
bugprone-suspicious-memory-comparison	26.547013759613037	25.50505304336548	-1.0419607162475586
concurrency-thread-canceltype-asynchronous	26.263588666915894	25.136251211166382	-1.1273374557495117
bugprone-forward-declaration-namespace	25.9287588596344	25.008864164352417	-0.9198946952819824
readability-redundant-string-cstr	23.537427186965942	22.448256492614746	-1.0891706943511963
bugprone-undelegated-constructor	20.048752546310425	18.956682205200195	-1.0920703411102295
readability-qualified-auto	15.37836241722107	15.080836534500122	-0.29752588272094727
modernize-use-emplace	4.773821592330933	14.527168273925781	9.753346681594849
bugprone-suspicious-enum-usage	14.164887189865112	13.146995306015015	-1.0178918838500977
bugprone-spuriously-wake-up-functions	13.156095027923584	12.736322402954102	-0.4197726249694824
readability-static-accessed-through-instance	12.926794290542603	12.513283491134644	-0.413510799407959
bugprone-posix-return	11.90090799331665	11.340661764144897	-0.5602462291717529
bugprone-string-literal-with-embedded-nul	10.703440427780151	10.592498779296875	-0.11094164848327637
bugprone-misplaced-pointer-arithmetic-in-alloc	10.223212718963623	10.165222644805908	-0.057990074157714844
bugprone-parent-virtual-call	9.846845388412476	9.678703784942627	-0.16814160346984863
bugprone-integer-division	9.215988874435425	8.984556436538696	-0.23143243789672852
bugprone-forwarding-reference-overload	9.3135404586792	8.964130878448486	-0.3494095802307129
modernize-make-shared	9.045833110809326	8.497532844543457	-0.5483002662658691
modernize-pass-by-value	8.941504716873169	8.305288553237915	-0.6362161636352539
performance-move-constructor-init	7.401392698287964	7.2588934898376465	-0.14249920845031738
bugprone-bool-pointer-implicit-conversion	7.345062494277954	6.850750207901001	-0.4943122863769531
bugprone-copy-constructor-init	6.975167274475098	6.750770092010498	-0.2243971824645996
bugprone-string-constructor	6.7134950160980225	6.646806478500366	-0.06668853759765625
readability-redundant-function-ptr-dereference	6.54532527923584	6.5386621952056885	-0.006663084030151367
bugprone-inaccurate-erase	6.466278314590454	6.080984115600586	-0.38529419898986816
bugprone-throw-keyword-missing	5.814808130264282	5.7655863761901855	-0.04922175407409668
modernize-deprecated-headers	nan	5.073593854904175	nan
readability-delete-null-pointer	5.1911232471466064	4.873579978942871	-0.31754326820373535
modernize-shrink-to-fit	4.877164125442505	4.6274073123931885	-0.2497568130493164
bugprone-string-integer-assignment	4.726957559585571	4.530395984649658	-0.19656157493591309
misc-uniqueptr-reset-release	4.664436340332031	4.451381206512451	-0.21305513381958008
bugprone-redundant-branch-condition	4.465813159942627	4.185862302780151	-0.2799508571624756
bugprone-shared-ptr-array-mismatch	nan	3.341019630432129	nan
bugprone-sizeof-container	2.260026454925537	2.22857928276062	-0.03144717216491699
misc-misleading-bidirectional	2.081869125366211	2.1336023807525635	0.05173325538635254
performance-trivially-destructible	1.301093339920044	1.281991958618164	-0.019101381301879883
readability-simplify-subscript-expr	0.4408137798309326	0.41901254653930664	-0.021801233291625977
bugprone-too-small-loop-variable	0.4015200138092041	0.390531063079834	-0.010988950729370117
readability-misplaced-array-index	0.40004992485046387	0.37462735176086426	-0.02542257308959961
performance-inefficient-vector-operation	0.3395109176635742	0.34320592880249023	0.0036950111389160156
bugprone-suspicious-missing-comma	0.31686949729919434	0.3038761615753174	-0.012993335723876953
readability-uniqueptr-delete-release	0.2931196689605713	0.29685425758361816	0.003734588623046875
bugprone-unhandled-exception-at-new	0.22762751579284668	0.2381892204284668	0.010561704635620117
performance-for-range-copy	0.2098989486694336	0.16357016563415527	-0.04632878303527832
bugprone-terminating-continue	0.15036296844482422	0.1605687141418457	0.010205745697021484
performance-implicit-conversion-in-loop	0.03404355049133301	0.03122091293334961	-0.0028226375579833984
bugprone-lambda-function-name	0.01359701156616211	0.012044429779052734	-0.001552581787109375

firewave commented 2 years ago

So the major regressions are (in descending order):

bugprone-reserved-identifier
readability-identifier-naming
bugprone-infinite-loop
misc-unused-using-decls
modernize-replace-auto-ptr
modernize-use-emplace
modernize-deprecated-ios-base-aliases

@njames93 was right though. misc-confusable-identifiers is the main offender but the regressions also add up.

Some observations:

readability-identifier-naming is enabled but there is no configuration for it and no warnings at all - we are not using it and we have never fixed such a warning.
bugprone-unchecked-optional-access should not be doing anything since we are specifying -std=c++11 and it is not available before c++17. I see it checks for absl:: and base:: (no idea which library that is) as well which are non-standard implementations. I think the containers should be configurable and the default checks for the non-standard implementations should only be enabled explicitly via abseil-* and base-* aliases.

whisperity commented 2 years ago

bugprone-unchecked-optional-access is also special in that it is using the new data-flow framework, so it's not a "conventional" Clang-Tidy check.

It seems base:: is something in Chromium. To me, what is surprising that the check isn't checking for Boost.Optional :astonished:

The problem with having an abseil-* alias is that aliases are problematic when they are supposed to run with different configurations, and that the groups of checks is supposed to categorise the guideline or the warning kind itself (the rule), not the specific implementation. Not accessing an optional with the UB accessor when it's empty isn't an Abseil-specific rule.

firewave commented 2 years ago

It seems base:: is something in Chromium. To me, what is surprising that the check isn't checking for Boost.Optional 😲

Or more obviously llvm::Optional.

firewave commented 2 years ago

I changed the title so this focuses on the actual issue and filed separate issues for the other things I noted.

How does misc-confusable-identifiers compare to -Wbidi-chars= from GCC?

If they are identical and since GCC enables -Wbidi-chars=unpaired by default you could make that an error via -Werror=bidi-chars and disable it in clang-tidy. No point in running essentially the same heuristic/checks twice.

njames93 commented 2 years ago

One glaring issue that needs addressing is most of the time spent in bugprone-reserved-identifier and readability-identifier-naming is computing the same thing, as both checks inherit from the same base which(redundantly) does the same matching logic. Theres other cases like this, with say the IncludeInserter which is duplicated runtime paths for all checks which use one. I have a vague design idea in my but I haven't got round to getting a design document sorted for it.

PiotrZSL commented 1 year ago

Optimizations were delivered in 2a84c635f2a1dcb4546a5d751a32eac24103c7e6, 8fdedcd1a242f6b54eec969e72e35ac0a68b7ea1, 1c282052624f9d0bd273bde0b47b30c96699c6c7.

PiotrZSL commented 1 year ago

As for other checks, most problems are in those "libraries", and should be tracked under separate issue on Clang 16/17 if occur.

llvm / llvm-project

`misc-confusable-identifiers` causing performance regression with clang-tidy-15 #57527