llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.71k stars 11.87k forks source link

[analyzer] Disappearing constraint 'x < N' #53176

Open tdnguyenND opened 2 years ago

tdnguyenND commented 2 years ago

Hi. I'm going to to add some checkers into StaticAnalyzer, but when I'm coding I meet some unexpected logic. How is the problem going First I add a new checker (Let say MyCustomChecker.cpp) to check on binary operators. This checker simply dump the operator and the program state at that point.

#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
#include "clang/StaticAnalyzer/Core/Checker.h"
#include "clang/StaticAnalyzer/Core/CheckerManager.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"

using namespace clang;
using namespace ento;

namespace {
class MyCustomChecker
    : public Checker<check::PreStmt<BinaryOperator>> {
public:
  void checkPreStmt(const BinaryOperator *B, CheckerContext &C) const;
};
} // namespace

void MyCustomChecker::checkPreStmt(const BinaryOperator *B,
                                 CheckerContext &C) const {
  ProgramStateRef currentState = C.getState();
  llvm::errs() << "************************\n";
  B->dump();
  currentState->dump();
  llvm::errs() << "\n";
}

void ento::registerMyCustomChecker(CheckerManager &mgr) {
  mgr.registerChecker<MyCustomChecker>();
}

bool ento::shouldRegisterMyCustomChecker(const CheckerManager &mgr) {
  return true;
}

In clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt

...
add_clang_library(clangStaticAnalyzerCheckers
...
    MyCustomChecker.cpp
...
)
...

In clang/include/StaticAnalyzer/Checkers/Checkers.td

...
def CustomCheckerPackage: Package<"CustomCheckerPackage">;

let ParentPackage = CustomCheckerPackage in {
  def MyCustomChecker : Checker<"MyCustomChecker">,
    HelpText<"My first custom checker.">,
    Documentation<HasDocumentation>;
}
...

And add analyzer options for that package in clang/lib/Driver/ToolChains/Clang.cpp:

...
static void RenderAnalyzerOptions(const ArgList &Args, ArgStringList &CmdArgs,
                                  const llvm::Triple &Triple,
                                  const InputInfo &Input) {
    ...
  CmdArgs.push_back("-analyzer-checker=CustomCheckerPackage");
    ...
}
...

Then I build llvm with command

cmake -DLLVM_ENABLE_PROJECTS=clang -DLLVM_LINK_LLVM_DYLIB=true -DLLVM_ENABLE_Z3_SOLVER=true -G "Unix Makefiles" ../llvm

Now create a test file:

#include<stdlib.h>
#include<stdio.h>
void foo(int N, unsigned int x){
    int *ptr;
    ptr = malloc(N * sizeof(*ptr));
    if (ptr != NULL) {
        if(x < N){
            x = 2;
        }
    }
}

And run the scan-build command scan-build -constraints z3 clang -c test.c The problem is the state (which is dumped when checking the line x = 2; does not contain constraint x < N.

"program_state": {
  "store": { "pointer": "0x5637ec7ccb28", "items": [
    { "cluster": "GlobalSystemSpaceRegion", "pointer": "0x5637ec7cca50", "items": [
      { "kind": "Default", "offset": 0, "value": "conj_$1{int, LC1, S26714, #1}" }
    ]}
  ]},
  "environment": { "pointer": "0x5637ec66dcd0", "items": [
    { "lctx_id": 1, "location_context": "#0 Call", "calling": "foo", "location": null, "items": [
      { "stmt_id": 26777, "pretty": "x", "value": "&x" },
      { "stmt_id": 26785, "pretty": "2", "value": "2 U32b" }
    ]}
  ]},
  "constraints": null,
  "equivalence_classes": null,
  "disequality_info": null,
  "dynamic_types": null,
  "dynamic_casts": null,
  "constructing_objects": null,
  "checker_messages": null
}

But when I add one more useless line to the test file:

...
    if(x < N){
        x = 2;
        N;
    }
...

The state now do contain the constraint x < N.

"program_state": {
  "store": { "pointer": "0x55673bfdab28", "items": [
    { "cluster": "GlobalSystemSpaceRegion", "pointer": "0x55673bfdaa50", "items": [
      { "kind": "Default", "offset": 0, "value": "conj_$1{int, LC1, S26714, #1}" }
    ]}
  ]},
  "environment": { "pointer": "0x55673be7bcd0", "items": [
    { "lctx_id": 1, "location_context": "#0 Call", "calling": "foo", "location": null, "items": [
      { "stmt_id": 26777, "pretty": "x", "value": "&x" },
      { "stmt_id": 26785, "pretty": "2", "value": "2 U32b" }
    ]}
  ]},
  "constraints": null,
  "equivalence_classes": null,
  "disequality_info": null,
  "dynamic_types": null,
  "dynamic_casts": null,
  "constructing_objects": null,
  "checker_messages": null
}

What is the real cause for this problem? Or is this a bug on llvm?

llvmbot commented 2 years ago

@llvm/issue-subscribers-clang-static-analyzer

haoNoQ commented 2 years ago

The state now do contain the constraint x < N.

I don't see it, still says "constraints": null; I'm assuming copy-paste error.

If it's indeed a copy-paste error, the constraint is most likely garbage-collected because we don't have any other information on N anyway and we won't be using N anywhere later in the program (which we discover through flow-sensitive relaxed live variables analysis) so no new information is going to ever come up on it, so the constraint is worthless.

This is not entirely correct though, one thing that the constraint still tells us is that x cannot be INT_MAX (simply because there exists an integer N such that x < N). @martong @ASDenysPetrov you're the solver people these days, WDYT about adding such clause, either immediately or during garbage collection? I'm also mildly worried about other cases of SymSymExprs not having correct lifetimes, maybe we could discover more such cornercases, especially when there are existing constraints over N (eg., if we additionally know that N < 10, do we realize that this means x < 9?).

@tdnguyenND What were you trying to accomplish with this lookup? Are you trying to make a checker for out-of-bound access to malloc() memory? There may be an entirely different approach to your problem so I strongly recommend asking on cfe-dev before committing to a specific solution.

tdnguyenND commented 2 years ago

@tdnguyenND What were you trying to accomplish with this lookup? Are you trying to make a checker for out-of-bound access to malloc() memory? There may be an entirely different approach to your problem so I strongly recommend asking on cfe-dev before committing to a specific solution.

Yes, you're right. I did it my way and it fail in a test case. The original code is:

#include<stdlib.h>
#include<stdio.h>
void foo(int N, unsigned int x){
    int *ptr;
    ptr = malloc(N * sizeof(*ptr));
    if (ptr != NULL) {
        if(x < N){
            *(ptr + x) = 480;
            printf("Value of the x-th integer is %d",*(ptr + x));
        }
    }
}

As you say, It doesn't use N so the constraint relevant to N is unnecessary, but it still be used when handling ptr

martong commented 2 years ago

Perhaps we have a bug in dissolving the symbol dependencies of a constraint. I think, we should keep the constraint x < N alive whenever x is used later. The below test demonstrates this, test2 unexpectedly fails. And if we print the state just before the expectation, we can see that the state in test2 indeed does not have the constraint.

// RUN: %clang_analyze_cc1 %s \
// RUN:   -analyzer-checker=core \
// RUN:   -analyzer-checker=debug.ExprInspection \
// RUN:   -analyzer-config eagerly-assume=false \
// RUN:   -triple x86_64-pc-linux-gnu \
// RUN:   -verify

#define UINT_MAX (~0U)
#define INT_MAX (int)(UINT_MAX & (UINT_MAX >> 1))

void clang_analyzer_eval(bool);

void test0(int N, int x) {
  if(x < N) {
    // OK
    clang_analyzer_eval(x < INT_MAX); // expected-warning{{TRUE}}
    (void)(x * N);
  }
}

void test1(int N, int x) {
  if(x < N) {
    // OK
    clang_analyzer_eval(x < INT_MAX); // expected-warning{{TRUE}}
    (void)(N);
  }
}

void test2(int N, int x) {
  if(x < N) {
    // FIXME This fails, Possible BUG!
    clang_analyzer_eval(x < INT_MAX); // expected-warning{{TRUE}}
    (void)(x);
  }
}
martong commented 2 years ago

This is not entirely correct though, one thing that the constraint still tells us is that x cannot be INT_MAX (simply because there exists an integer N such that x < N). @martong @ASDenysPetrov you're the solver people these days, WDYT about adding such clause, either immediately or during garbage collection? I'm also mildly worried about other cases of SymSymExprs not having correct lifetimes, maybe we could discover more such cornercases, especially when there are existing constraints over N (eg., if we additionally know that N < 10, do we realize that this means x < 9?).

Here is where we are at the moment. From x < N we already deduce that x < INT_MAX. This happens because of the symbol simplification mechanism. On the false branch we have x >= INT_MAX i.e. x is constrained to INT_MAX and that contradicts to with x < N once we substitute x (INT_MAX < N is infeasible).

Symbol simplification kicks in only with constant values, thus we cannot deduce (x < N, N < 10) => x < 9. But (x < N, N == 10) => x < 10 is working.

void test(int N, int x) {
  if(x < N) {
    // OK
    clang_analyzer_eval(x < INT_MAX); // expected-warning{{TRUE}}
    (void)(N + x);
  }
}

void test1(int N, int x) {
  if(x < N) {
    if (N == 10) {
      // OK
      clang_analyzer_eval(x < 10); // expected-warning{{TRUE}}
    }
    (void)(N + x);
  }
}

void test2(int N, int x) {
  if(x < N) {
    if (N < 10) {
      // FAILS
      clang_analyzer_eval(x < 10); // expected-warning{{TRUE}}
    }
    (void)(N + x);
  }
}

Perhaps we could achieve what you want with the ConstraintAssignor. We could assign additional constraints to x when we face N < 10 knowing that x < N.