Closed whisperity closed 1 month ago
@llvm/issue-subscribers-clang-tidy
Thanks for bringing this to our attention. We'll look into those bugs, but in the meantime are there any other steps we should take (e.g. disable the tidy elsewhere)?
Thanks for bringing this to our attention. We'll look into those bugs, but in the meantime are there any other steps we should take (e.g. disable the tidy elsewhere)?
@ymand I am not sure whether it is the dataflow framework, or simply the matchers within Tidy are the culprit. In the first case, T
is int
... that gives me the hunch that the matchers somehow try to consider a type that is most certainly not dereferenceable as a dereferenceable entity...? I have not observed crashes or hangs related to this checker on other projects I am involved in, but I cannot safely confirm if those projects were running this checker in the first place.
(The affected project is my personal one, and I have bugprone-*
enabled unconditionally, and my CI runs using the nightly LLVM PPA and that is why I observed this bug. Here is a job that hung and GitHub auto-killed it at a timeout.)
Note that the affected structures in my code, especially the first crash, involve heavy "magic" with if constexpr
and such!
I have one more C++ project that I am involved in and is using C++17 and onwards features, I will try to run the analysis on that with this checker and report back with the findings.
The act of disabling the check on the user's side alleviates the problem. I would not say a revert is needed here...
Sounds good. I wasn't sure if there was some global clang-tidy disabling (short of a revert), so thanks for confirming that.
Agreed on the nature of the examples involving unusual constructs. We've also just encountered this ourselves with templates. I think that the check needs to do a better job filtering the functions it sends to the analysis engine, but we also have issues in the core that we're still ironing out.
If there are too many problems I think a partial revert could be done by removing the registerChecker
line from the BugproneTidyModule.cpp
. That way, Tidy will think the checker is not existing, but the entire checker's code need not be pulled out from the repository (which would cause a lot of unnecessary noise and pollution in the Git history).
However,
T - { RecordType : monomux::message::response::ClientID }
This is not a template. It is not even dereference-able:
#define MONOMUX_MESSAGE(KIND, NAME) \
static constexpr MessageKind Kind = MessageKind::KIND; \
static std::optional<NAME> decode(std::string_view Buffer); \
static std::string encode(const NAME& Object);
struct ClientID
{
MONOMUX_MESSAGE(ClientIDResponse, ClientID);
monomux::message::ClientID Client;
};
The second example (the infinite recursion crash) feels like the most trivial use case. I have a type, which I put inside an optional... I receive an optional from a function returning an optional. If the optional is false, I return, if it is not, I dereference it. This is analogous to the pointer case through and through... So an infinite recursion in there is definitely something wrong.
For the first crash case, I am not sure if T
is supposedly the optional<int>
or the int
inside the optional...
@AaronBallman What do you think? Should they do a partial revert for the availability of the checker?
T
is always bound to the type argument of the std::optional
. So, the binding to ClientID
makes sense because of this line:
static std::optional<NAME> decode(std::string_view Buffer);
The issue of templates relates to the other binding: fun
. Fwiw, I'm not suggesting that's the issue here -- only that its another example of "unusual" code that we've seen issues with.
I have one more C++ project that I am involved in and is using C++17 and onwards features, I will try to run the analysis on that with this checker and report back with the findings.
I have started a run on that other project in GitHub Actions and the only change against a working master branch was enabling this check. (Sadly, due to GHA hanging in a way that the output is not recoverable, I can't give more details...) While the analysis of that project usually concludes in about 1 hr 30 minutes, the job has been running for more than 5 hours at this point (and GitHub terminates jobs at the 6 hour mark), indicating that many of the TUs in that project are also crashing, infinite hanging, etc.
(I will try to compile a local LLVM latest and run the analysis of the project with that, not the PPA version, but running the build of this "other project" is nontrivial due to all the dependencies it has. The stack traces of my own project come from a local execution with the PPA. Unfortunately I can't seem to get the traces out of GitHub if the job hangs.)
Regarding the apparent infinite loop. It's not actually an infinite loop and it's not even tied to this checker - I can repro in the core tests with this code:
struct Lookup {
int x;
};
void target(Lookup val, bool b) {
const Lookup* l = nullptr;
while (b) {
l = &val;
}
}
The immediate cause of this issue is that Environment::MemberLocToStruct
differs each time we compare the environments at the conclusion of the loop body. That difference is caused by a chain of events triggered by the join at the back edge of the loop body. First, the incoming edge maps l
to nullptr
, while the backedge maps l
to &val
. Then, when we try to merge those two, we reach mergeDistinctValues
, which results in a call to createValue for the type const Lookup *
. That allocates a fresh StructValue
that triggers the update to MemberLocToStruct
.
Now, we have a (merged) environment with a fresh entry in MemberLocToStruct
. Note that it's isomorphic to the previous entry, but since its fresh it looks different and invalidates the equality comparison.
A solution is to remove this field from the comparison. It's a supporting data structure and I think it can safely be ignored. That will solve the problem in this case, because it is assigned to the same location on each iteration. Moreover, the comparison for indirection nodes is rather permissive (for the time being), so even different addresses would not necessarilly trigger a problem, depending on the permissiveness of the model's comparison function: https://github.com/llvm/llvm-project/blob/86617256864ebcbda03b6ce843deeb6a41a85800/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp#L61-L69.
I described a test to reproduce the hang in https://github.com/llvm/llvm-project/issues/59492
Is this related to https://github.com/llvm/llvm-project/issues/59492, one of the recent comments on that issues suggests it may be fixed on main for LLVM18?
Is this related to #59492, one of the recent comments on that issues suggests it may be fixed on main for LLVM18?
@Skylion007 Could very well be. There are a lot of issues related to this checker, so could very well be. The data-flow library/framework is an evolving part of the suite. I will try to circle around and re-run the analysis on my originally reported project.
clang-tidy-18
at Ubuntu LLVM version 18.1.8
no longer reproduces these issues, so it looks like this is fixed.
When analysing "relatively complex"(?) code with
clang-tidy
, thebugprone-unchecked-optional-access
checker either hangs, fails with seemingly infinite recursion, or crashes with a non-infinite stack trace.Running nightly PPA on Ubuntu 20.04 LTS:
Non-infinite stack trace
The affected
isMapped()
function looks like this:where
E
is either some type likeint
orT*
in caseIntrusiveDefaultSentinel
istrue
, andstd::unique_ptr<T>
orstd::optional<T>
otherwise. Both are castable tobool
and are false when "empty".Infinite recursion depth
I observed a crash with a seemingly infinite stack trace.
The affected code snippet is generated partially from a macro:
Non-halting (?) execution
While the first crashes happened almost immediately and the second one happened after about 20 minutes of execution time, the last analysis job that I am running for my project, for a file that has a similar pattern to the 2nd case, has been running ~100% CPU use, currently at 496M resident memory, for about 80 minutes already.
Disabling the check or running Tidy with just other checks makes the analyses conclude successfully. With every other check used by the project (and also running Clang SA), the total analysis time is about 7 minutes.