llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.25k stars 11.67k forks source link

clang-tidy: provide a method for checks to produce self-diagnostic messages #76399

Open daltairwalter opened 9 months ago

daltairwalter commented 9 months ago

Many of the clang-tidy checks are for situations that ideally are fairly nuanced and apply only in specific situations. It would be useful to have a standard way to make self-diagnostic messages for checks so that they can reveal why they are choosing to add fixes in certain places or ignore others. These self-diagnostic messages would be messages that the user would typically not be interested in, unless the check is not working as intended, and would also still be available when the check sees the code as already correct.

I am particularly interested in this for #70621 where there is a significant amount of determining that needs to be done to decide if the check should produce a warning/fix or not. Currently, I am creating this check with the self-diagnostic messages being mixed into the normal user diag messages which is a source of review objection and I agree with the reviewer - this is not ideal.

PiotrZSL commented 9 months ago

Problem is that such diagnostic/debugs will make harder to undestand clang-tidy checks and provide no value for end user. Usualy developers for a timeframe of check development add some AST node .dump() or llvm::errs() logs but those are removed once check is finished. And more familiar with AST you get, then less of such logs you need.

daltairwalter commented 9 months ago

As an end user, I would find value in seeing the decision making process for a check because:

  1. Most of the clang-tidy checks that I have tried don’t work 100% the way I would desire them to work. Self-diagnostics from these checks would give me a better understanding of whether these checks are still useful to me and which situations need to be checked in other ways or otherwise fixed after running clang-tidy.
  2. It would make it easier to write a clear bug report about what is going wrong with a check.
  3. I would hope it would increase the likelihood of problems being fixed.

As a clang-tidy user, I have frequently seen the same fix applied multiple times resulting in code that doesn’t compile. While creating a check, I learned that this is avoided if the diag messages are exactly the same. Perhaps this is a separate problem, but it would be solved if there was another place for the extra details that the check is trying to provide the user could be captured other than the diag message.

PiotrZSL commented 9 months ago

End user usualy doesn't know nothing about AST, they should just provide an "reproduction example" and fill an issue report, thats usually sufficient. Developer that will be fixing issue need to analyse check (and it's way easier to analyse 100 lines of code than 500 lines with some debugs) and reproduce issue in tests.

"I have frequently seen the same fix applied multiple times resulting in code that doesn’t compile"

Normal thing, but to fix such issue in check, reproduction example is needed as test will need to be writeln. Developer got multiple tools that can be used to debug issue, those are crash reports, clang-query, clang-check, tests, AST dump, llvm::errs. Often issue is in incorrently constructed fix-its. Even if end user could point exacly an line in code where bug is and provide possible solution, that solution may not be correct or even may not have any value without proper testing or actual bug may be somewere else.

As for "decision making process", for some checks that would be GBs of logs that would slow down checks even more.

daltairwalter commented 9 months ago

"they should just provide an "reproduction example""

If the user is working with a proprietary system, they have to understand the problem well enough to make a reproducible example.

As an end user, I know about the C++ code I am feeding clang-tidy and anything that can be printed that relates to C++ is helpful for me to understand it. I am not encouraging the printing of the AST, but code that is making decisions based on multiple separate details that it finds in the AST could provide some C++ like information about each of these observations. Again, this would help me determine if the problems I am having with a check are isolated or if it is an otherwise useful check.

I think the idea that a 100 line check would become a 500 line check with diagnostics is fairly extreme and the review process for clang-tidy is sufficient to prevent this from happening in practice. It would be more normal for a 500 line check to become a 510 line check. For an extremely verbose case, it may add 100-200 lines in a 1000 line check. If this information makes understanding which part of the AST is being processed wrong, this seems worthwhile. Otherwise, the person debugging a problem is left trying to decide which part of the decision is wrong.

In extreme cases, I suppose this could be GB's of data, but I am not proposing something that is on by default. This is to get more information about the strange situation in file x that didn't work as expected so I am trying again with the flag to get more information.