llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.14k stars 12.02k forks source link

[clang static analyzer] bug repair in conversion between unsigned/signed SymbolVal #61985

Open tianxinghe opened 1 year ago

tianxinghe commented 1 year ago

version: clang/llvm 0e37487df8e0 April 5, 2023 args: clang --analyze --analyzer-no-default-checks -Xanalyzer -analyzer-checker=core.NullDereference -Xanalyzer -analyzer-output=text 1.c

Hi! I try to identify the cause of the bug in #61515 My test case is:

    #include <stdint.h>
    #include <stdio.h> 

    void init(uint32_t, uint8_t**);
    uint8_t* malloc(uint64_t);
    uint64_t atol(uint8_t*);
    int main(int, char **);
    uint8_t* memset(uint8_t*, uint32_t, uint64_t);

    uint64_t* args;

    void init(uint32_t argc, uint8_t** argv) {
         args = (uint64_t*) malloc(8 * sizeof(uint64_t));
         for (int32_t i = 1; i < argc; ++i) {
             args[i - 1] = atol(argv[i]);
         }
    }

    int main(int argc, char ** argv) {
      uint32_t llvm_cbe_argc = (uint32_t)argc;
      uint8_t** llvm_cbe_argv = (uint8_t**)argv;
      uint8_t* llvm_cbe_llvm_cbe_a1;    /* Address-exposed local */
      uint8_t* _1;
      uint64_t _2;
      uint64_t _3;
      _1 = ((uint8_t*)(&llvm_cbe_llvm_cbe_a1));
      llvm_cbe_llvm_cbe_a1 = ((uint8_t*)/*NULL*/0);
      init(llvm_cbe_argc, llvm_cbe_argv);
      _2 = *args;
      _3 = *args;

      if ((int64_t)_3 < (int64_t)UINT64_C(3369728583)) {
        if((int64_t)_2 >= (int64_t)UINT64_C(-1)){
          **(&llvm_cbe_llvm_cbe_a1);
        }
      }
      return 0;
    }

The function VisitNonLocLocAsInteger in clang/lib/StaticAnalyzer/SValBuilder.cpp fail to convert the symbolval from unsigned to signed when the engine visits the cast (int64_t)_2 in IfStmt and evaluate the binop '(int64_t)_2 >= (int64_t)UINT64_C(-1)' later as unsigned

ShouldSupportSymbolicIntegerCasts is false in clang/lib/StaticAnalyzer/SValBuilder.cpp function VisitNonLocSymbolVal The engine thinks the SymbolVal should not be converted to signed when visit the cast (int64_t)_2 but converts -1 to unsigned when visit the binary operator (int64_t)_2 >= (int64_t)UINT64_C(-1) So signed bin op is calculated unsigned!

in clang/lib/StaticAnalyzer

  1. visitCast (int64_t)_2

    • visitCast in clang/lib/StaticAnalyzer/ExprEngineC.cpp
    • switch (CastE->getCastKind())
    • case CK_IntegralCast V = svalBuilder.evalIntegralCast(state, V, T, ExTy);
    • evalIntegralCast in clang/lib/StaticAnalyzer/SValBuilder.cpp
    • if (getContext().getTypeSize(castTy) >= getContext().getTypeSize(originalTy)) return evalCast(val, castTy, originalTy);
    • evalCast in clang/lib/StaticAnalyzer/SValBuilder.cpp
    • return TRV.Visit(V);
    • Visit in clang/lib/StaticAnalyzer/SValBuilder.cpp
    • evalCast in clang/lib/StaticAnalyzer/SValBuilder.cpp
    • VisitNonLocSymbolVal in clang/lib/StaticAnalyzer/SValBuilder.cpp

    if (haveSameType(T, CastTy)) return V;

  2. visitBinaryOperator (int64_t)_2 >= (int64_t)UINT64_C(-1)

    • visitBinaryOperator in clang/lib/StaticAnalyzer/ExprEngineC.cpp
    • SVal Result = evalBinOp(state, Op, LeftV, RightV, B->getType());
    • evalBinOp in ExprEngine.h
    • evalBinOp in clang/lib/StaticAnalyzer/SValBuilder.cpp
    • return evalBinOpNN(state, op, lhs.castAs<NonLoc>(), rhs.castAs<NonLoc>(), type);
    • evalBinOpNN in clang/lib/StaticAnalyzer/SimpleSValBuilder.cpp
    • switch (lhs.getSubKind())
    • case nonloc::SymbolValKind
    • return MakeSymIntVal(symIntExpr, op, *RHSValue, resultTy);
    • MakeSymIntVal in clang/lib/StaticAnalyzer/SimpleSValBuilder.cpp
    • if (RHS.isSigned() && !SymbolType->isSignedIntegerOrEnumerationType()) ConvertedRHS = &BasicVals.Convert(SymbolType, RHS);

This is a more complex test case. Through fix above, the checker successfully finds the NPD bug. 1.zip

It can be clang 1.c -o t /t 0 0 0 471756672 303 -1000 4294966656 0

This bug exists at least in llvm9.

tianxinghe commented 1 year ago

@EugeneZelenko Thank you for your suggestion before! Could you help me confirm if it is correct? Thank you again!

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-static-analyzer

tianxinghe commented 1 year ago

@EugeneZelenko I edit it and add the input for the complex test case.

tianxinghe commented 1 year ago

@EugeneZelenko Hello!Can you help me confirm if it is a bug or if I just made a mistake?

steakhal commented 1 year ago

ATM cast modeling is quite right, thus my guess would be that it's a bug. However, the repro is way too large that I could confirm this by only looking at it and playing with it in the browser. Could you minimize it to still reflect the observed issue? @tianxinghe

tianxinghe commented 1 year ago

截图 2023-08-27 00-17-35

    #include <stdint.h>
    #include <stdio.h> 

    uint64_t* args;

    int main(int argc, char ** argv) {
        uint8_t* llvm_cbe_llvm_cbe_a1;
        uint64_t _2;
        llvm_cbe_llvm_cbe_a1 = ((uint8_t*)/*NULL*/0);
        _2 = *args;

        if((int64_t)_2 >= (int64_t)UINT64_C(-1)){
            **(&llvm_cbe_llvm_cbe_a1);
        }
        return 0;
    }

for (int64_t)_2 >= (int64_t)UINT64_C(-1) when visitcast (int64_t)UINT64_C(-1) the engine forget to convert the SymbolVal from unsigned to signed VisitNonLocSymbolVal in clang/lib/StaticAnalyzer/SValBuilder.cpp

and when doing the comparison the engine does it unsigned in clang/lib/StaticAnalyzer/SimpleSValBuilder.cpp line195-201

@steakhal Thanks!

steakhal commented 1 year ago

It's not quite an oversight, rather a tradeoff. The constraint manager is limited, and cannot really see through casts, thus we are likely to reuse more constraints if we don't have those casts. The downside is that we end up with FPs and FNs because of this. We want to move away from this, and indeed emit those missing casts and just do the right thing, but it turns out to be quite hard to do incrementally.

tianxinghe commented 1 year ago

I try to fix it by add code as follows in clang/lib/StaticAnalyzer/SValBuilder.cpp function VisitNonLocSymbolVal :

if (T->isIntegralOrUnscopedEnumerationType() &&
  CastTy->isIntegralOrUnscopedEnumerationType()) {
AnalyzerOptions &Opts = VB.getStateManager()
                            .getOwningEngine()
                            .getAnalysisManager()
                            .getAnalyzerOptions();

+ if(T->isIntegerType() && CastTy->isIntegerType()){
+   if(CastTy->isSignedIntegerType() != T->isSignedIntegerType()){
+      return simplifySymbolCast(V, CastTy);
+   }
+ }

if (!Opts.ShouldSupportSymbolicIntegerCasts)
  return V;
return simplifySymbolCast(V, CastTy);

}

or change the ShouldSupportSymbolicIntegerCasts Can I do this?

It's not quite an oversight, rather a tradeoff. The constraint manager is limited, and cannot really see through casts, thus we are likely to reuse more constraints if we don't have those casts. The downside is that we end up with FPs and FNs because of this. We want to move away from this, and indeed emit those missing casts and just do the right thing, but it turns out to be quite hard to do incrementally.

steakhal commented 1 year ago

The ShouldSupportSymbolicIntegerCasts is the value of the experimental flag for opting in to the cast modeling improvements. AFAIK nobody uses that as it's not ready for production. Once we finally implement cast modeling, ShouldSupportSymbolicIntegerCasts will default to true, and everyone would benefit from this. You can check if setting that flag helps you or not.

tianxinghe commented 1 year ago

The ShouldSupportSymbolicIntegerCasts is the value of the experimental flag for opting in to the cast modeling improvements. AFAIK nobody uses that as it's not ready for production. Once we finally implement cast modeling, ShouldSupportSymbolicIntegerCasts will default to true, and everyone would benefit from this. You can check if setting that flag helps you or not.

Thanks!