llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.34k stars 12.13k forks source link

solver doesn't realize that `z > x && z < y` is unsat with the fact `x == y` #62215

Open hallo-wereld opened 1 year ago

hallo-wereld commented 1 year ago

version: Clang trunk args:

--analyze -Xclang -analyzer-stats -Xclang -analyzer-checker=core,debug.ExprInspection -Xanalyzer -analyzer-constraints=z3 --analyzer-output text

Clang Static Analyzer doesn't realize that z > x && z < y is unsatisfiable with the fact x == y. See it live: https://godbolt.org/z/Te3d9GPM6

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-static-analyzer

EugeneZelenko commented 1 year ago

@hallo-wereld: Please elaborate.

hallo-wereld commented 1 year ago

@hallo-wereld: Please elaborate.

Sorry,

steakhal commented 1 year ago

Confirmed: https://godbolt.org/z/q6W8fhr67

int top(int x, int y, int z) {
    assert(x == y);
    assert(z > x);
    assert(z < y);
    // This should be dead code.
    clang_analyzer_warnIfReached();
}

We could probably get this work if we would accept some runtime cost. Given that usually eqclasses are not large, it shouldn't impose large runtime overhead in general. Here is what we could do: If we have X RELOP Y, For all X' eqclass of X, and Y' eqclass of Y: try to resolve X' RELOP Y'

0x21af commented 1 year ago

Confirmed: https://godbolt.org/z/q6W8fhr67

int top(int x, int y, int z) {
    assert(x == y);
    assert(z > x);
    assert(z < y);
    // This should be dead code.
    clang_analyzer_warnIfReached();
}

We could probably get this work if we would accept some runtime cost. Given that usually eqclasses are not large, it shouldn't impose large runtime overhead in general. Here is what we could do: If we have X RELOP Y, For all X' eqclass of X, and Y' eqclass of Y: try to resolve X' RELOP Y'

That idea does sound promising! Thanks :)

tianxinghe commented 1 year ago

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

steakhal commented 1 year ago

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

tianxinghe commented 1 year ago

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you!

steakhal commented 1 year ago

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you!

Let me explain. CSA maps symbols to equalivalence classes (eqclasses), and then maps eqclasses to range sets as constraints. The rationale is that if x is now known to be equal to y, everything that they were equal to should be all equal and share the constraints we learned. This is achieved by intersecting the range sets of the merged eqclasses.

Now, if we learn that x == y, then learn that z > x; and then try to evaluate the expression z < y one would expect that we recognize that y could be substituted by x (or any other member of its eqclass). And AFAIK we would successfully recognize the pattern z < x when simplifying the expression and eliminate that path [1]. However, since we don't do such substitution, the pattern won't match and the path won't be eliminated. What I suggested was to not try to improve the simplification of relational operators, but rather brute-force try to use each member of the eqclass and try each combination. (yes, N*M combinations, where X RELOP Y, N := |eqclass(X)|, M := |eqclass(Y)|, arguing that usually, eqclasses are really small.

[1]: You can see that if you replace assert(z < y); by assert(z < x);, we would successfully eliminate the path, and the FP. It is because we have seen z > x before; thus z < x can't be also true; and this is recognized by some pattern when the z < x is simplified.

tianxinghe commented 1 year ago

Thank you for such a detailed explanation. I have a fever and am seeing a doctor. I will read it carefully later. Thank you again!

---Original--- From: "Balazs @.> Date: Tue, Oct 31, 2023 00:41 AM To: @.>; Cc: "Tianxing @.**@.>; Subject: Re: [llvm/llvm-project] solver doesn't realize that z &gt; x &amp;&amp; z < yis unsat with the fact x == y (Issue #62215)

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you!

Let me explain. CSA maps symbols to equalivalence classes (eqclasses), and then maps eqclasses to range sets as constraints. The rationale is that if x is now known to be equal to y, everything that they were equal to should be all equal and share the constraints we learned.

Now, if we learn that x == y, then learn that z > x; and then try to evaluate the expression z < y one would expect that we recognize that y could be substituted by x (or any other member of its eqclass). And AFAIK we would successfully recognize the pattern z < x when simplifying the expression and eliminate that path [1]. However, since we don't do such substitution, the pattern won't match and the path won't be eliminated. What I suggested was to not try to improve the simplification of relational operators, but rather brute-force try to use each member of the eqclass and try each combination. (yes, N*M combinations, where X RELOP Y, N := |eqclass(X)|, M := |eqclass(Y)|`, arguing that usually, eqclasses are really small.

[1]: You can see that if you replace assert(z < y); by assert(z < x);, we would successfully eliminate the path, and the FP. It is because we have seen z > x before; thus z < x can't be also true; and this is recognized by some pattern when the z < x is simplified.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

martong commented 1 year ago

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you!

Let me explain. CSA maps symbols to equalivalence classes (eqclasses), and then maps eqclasses to range sets as constraints. The rationale is that if x is now known to be equal to y, everything that they were equal to should be all equal and share the constraints we learned. This is achieved by intersecting the range sets of the merged eqclasses.

Now, if we learn that x == y, then learn that z > x; and then try to evaluate the expression z < y one would expect that we recognize that y could be substituted by x (or any other member of its eqclass). And AFAIK we would successfully recognize the pattern z < x when simplifying the expression and eliminate that path [1]. However, since we don't do such substitution, the pattern won't match and the path won't be eliminated. What I suggested was to not try to improve the simplification of relational operators, but rather brute-force try to use each member of the eqclass and try each combination. (yes, N*M combinations, where X RELOP Y, N := |eqclass(X)|, M := |eqclass(Y)|, arguing that usually, eqclasses are really small.

[1]: You can see that if you replace assert(z < y); by assert(z < x);, we would successfully eliminate the path, and the FP. It is because we have seen z > x before; thus z < x can't be also true; and this is recognized by some pattern when the z < x is simplified.

I think you can get this done within O(N) steps (N := |eqclass(X)|). When you assume z > x then in the constraint solver you assign the not-null range ([[-inf, -1], [1, inf]]) to z > x. Then when you assume z < y, you can build the symbol z < S for each element in the eq class of x. Query (infer) the ranges to all z < S and you should find different results which means there is a contradiction, you can return the infeasible (null) state.

steakhal commented 1 year ago

@martong Thanks for the idea, when I read it it clicked and I could not resist experimenting with it today. Luckily, after an afternoon, I have a patch for it.