Open hallo-wereld opened 1 year ago
@llvm/issue-subscribers-clang-static-analyzer
@hallo-wereld: Please elaborate.
@hallo-wereld: Please elaborate.
Sorry,
Confirmed: https://godbolt.org/z/q6W8fhr67
int top(int x, int y, int z) {
assert(x == y);
assert(z > x);
assert(z < y);
// This should be dead code.
clang_analyzer_warnIfReached();
}
We could probably get this work if we would accept some runtime cost.
Given that usually eqclasses are not large, it shouldn't impose large runtime overhead in general.
Here is what we could do:
If we have X RELOP Y
,
For all X' eqclass of X, and Y' eqclass of Y: try to resolve X' RELOP Y'
Confirmed: https://godbolt.org/z/q6W8fhr67
int top(int x, int y, int z) { assert(x == y); assert(z > x); assert(z < y); // This should be dead code. clang_analyzer_warnIfReached(); }
We could probably get this work if we would accept some runtime cost. Given that usually eqclasses are not large, it shouldn't impose large runtime overhead in general. Here is what we could do: If we have
X RELOP Y
, For all X' eqclass of X, and Y' eqclass of Y: try to resolveX' RELOP Y'
That idea does sound promising! Thanks :)
Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?
Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?
In the example of the issue, x
, y
, z
were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.
Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?
In the example of the issue,
x
,y
,z
were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.
I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you!
Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?
In the example of the issue,
x
,y
,z
were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you!
Let me explain.
CSA maps symbols to equalivalence classes (eqclasses), and then maps eqclasses to range sets as constraints.
The rationale is that if x
is now known to be equal to y
, everything that they were equal to should be all equal and share the constraints we learned. This is achieved by intersecting the range sets of the merged eqclasses.
Now, if we learn that x == y
, then learn that z > x
; and then try to evaluate the expression z < y
one would expect that we recognize that y
could be substituted by x
(or any other member of its eqclass). And AFAIK we would successfully recognize the pattern z < x
when simplifying the expression and eliminate that path [1]. However, since we don't do such substitution, the pattern won't match and the path won't be eliminated.
What I suggested was to not try to improve the simplification of relational operators, but rather brute-force try to use each member of the eqclass and try each combination. (yes, N*M combinations, where X RELOP Y
, N := |eqclass(X)|
, M := |eqclass(Y)|
, arguing that usually, eqclasses are really small.
[1]: You can see that if you replace assert(z < y);
by assert(z < x);
, we would successfully eliminate the path, and the FP.
It is because we have seen z > x
before; thus z < x
can't be also true; and this is recognized by some pattern when the z < x
is simplified.
Thank you for such a detailed explanation. I have a fever and am seeing a doctor. I will read it carefully later. Thank you again!
---Original---
From: "Balazs @.>
Date: Tue, Oct 31, 2023 00:41 AM
To: @.>;
Cc: "Tianxing @.**@.>;
Subject: Re: [llvm/llvm-project] solver doesn't realize that z > x && z < y
is unsat with the fact x == y
(Issue #62215)
Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?
In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.
I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you!
Let me explain. CSA maps symbols to equalivalence classes (eqclasses), and then maps eqclasses to range sets as constraints. The rationale is that if x is now known to be equal to y, everything that they were equal to should be all equal and share the constraints we learned.
Now, if we learn that x == y, then learn that z > x; and then try to evaluate the expression z < y one would expect that we recognize that y could be substituted by x (or any other member of its eqclass). And AFAIK we would successfully recognize the pattern z < x when simplifying the expression and eliminate that path [1]. However, since we don't do such substitution, the pattern won't match and the path won't be eliminated. What I suggested was to not try to improve the simplification of relational operators, but rather brute-force try to use each member of the eqclass and try each combination. (yes, N*M combinations, where X RELOP Y, N := |eqclass(X)|, M := |eqclass(Y)|`, arguing that usually, eqclasses are really small.
[1]: You can see that if you replace assert(z < y); by assert(z < x);, we would successfully eliminate the path, and the FP. It is because we have seen z > x before; thus z < x can't be also true; and this is recognized by some pattern when the z < x is simplified.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?
In the example of the issue,
x
,y
,z
were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you!
Let me explain. CSA maps symbols to equalivalence classes (eqclasses), and then maps eqclasses to range sets as constraints. The rationale is that if
x
is now known to be equal toy
, everything that they were equal to should be all equal and share the constraints we learned. This is achieved by intersecting the range sets of the merged eqclasses.Now, if we learn that
x == y
, then learn thatz > x
; and then try to evaluate the expressionz < y
one would expect that we recognize thaty
could be substituted byx
(or any other member of its eqclass). And AFAIK we would successfully recognize the patternz < x
when simplifying the expression and eliminate that path [1]. However, since we don't do such substitution, the pattern won't match and the path won't be eliminated. What I suggested was to not try to improve the simplification of relational operators, but rather brute-force try to use each member of the eqclass and try each combination. (yes, N*M combinations, whereX RELOP Y
,N := |eqclass(X)|
,M := |eqclass(Y)|
, arguing that usually, eqclasses are really small.[1]: You can see that if you replace
assert(z < y);
byassert(z < x);
, we would successfully eliminate the path, and the FP. It is because we have seenz > x
before; thusz < x
can't be also true; and this is recognized by some pattern when thez < x
is simplified.
I think you can get this done within O(N)
steps (N := |eqclass(X)|
). When you assume z > x
then in the constraint solver you assign the not-null range ([[-inf, -1], [1, inf]]
) to z > x
. Then when you assume z < y
, you can build the symbol z < S
for each element in the eq class of x
. Query (infer) the ranges to all z < S
and you should find different results which means there is a contradiction, you can return the infeasible (null) state.
@martong Thanks for the idea, when I read it it clicked and I could not resist experimenting with it today. Luckily, after an afternoon, I have a patch for it.
version: Clang trunk args:
Clang Static Analyzer doesn't realize that
z > x && z < y
is unsatisfiable with the factx == y
. See it live: https://godbolt.org/z/Te3d9GPM6