secure-software-engineering / phasar

A LLVM-based static analysis framework.
Other
919 stars 140 forks source link

Problem in initialization Seeds #633

Closed Luweicai closed 1 year ago

Luweicai commented 1 year ago

// The %0 is the taint seed.
define  void @foo(i32 %0){
 call void @llvm.dbg.value(metadata i32 %0, metadata !21, metadata !DIExpression()), !dbg !22;
%1 = add nsw i32 %0, 1;
%2 = add nsw i32 %1, 1;
call void @tt(i32 %3);
%5 = add nsw i32 %4, 1;
}

The taint fact value has some mistakes in the no call instruction:

N: %1 = add nsw i32 %0, 1;
----------------------------------------------------
D: @zero_value | V: BOTTOM
D: i32 %0 | V: BOTTOM

N: %2 = add nsw i32 %1, 1;
-----------------------------------------------------
D: @zero_value | V: BOTTOM
D: i32 %0 | V: TOP
%1 = add nsw i32 %0, 1; | V: TOP (should be BOTTOM)

N: call void @tt(i32 %3);
-----------------------------------------------------
D: @zero_value | V: BOTTOM
D: i32 %0 | V: BOTTOM
%1 = add nsw i32 %0, 1; | V: BOTTOM

N: %5 = add nsw i32 %4, 1;
-----------------------------------------------------
D: @zero_value | V: BOTTOM
D: i32 %0 | V: TOP
D: %1 = add nsw i32 %0, 1; | V: TOP (should be BOTTOM)
D: %2 = add nsw i32 %1, 1; | V: TOP (should be BOTTOM)

If a taint seed is a argument of one function, it will be attached to the entry instructon of that function. The following is the code of taint seeds initialization.

std::map<const llvm::Instruction *, std::set<const llvm::Value *>>
LLVMTaintConfig::makeInitialSeedsImpl() const {
  std::map<const llvm::Instruction *, std::set<const llvm::Value *>>
      InitialSeeds;
  for (const auto *SourceValue : SourceValues) {
    if (const auto *Inst = llvm::dyn_cast<llvm::Instruction>(SourceValue)) {
      InitialSeeds[Inst].insert(Inst);
    } else if (const auto *Arg = llvm::dyn_cast<llvm::Argument>(SourceValue);
               Arg && !Arg->getParent()->isDeclaration()) {
      const auto *FunFirstInst = &Arg->getParent()->getEntryBlock().front();
      InitialSeeds[FunFirstInst].insert(Arg);
    }
  }
  return InitialSeeds;
}

However, when the exploed spuer graph is construted and comes to the DFA Phase II, in the valueComputationTask,

        void valueComputationTask(const std::vector<n_t> &Values) {
            PAMM_GET_INSTANCE;
            for (n_t n : Values) {
                for (n_t SP : ICF->getStartPointsOf(ICF->getFunctionOf(n))) {
                    using TableCell = typename Table<d_t, d_t, EdgeFunctionPtrType>::Cell;
                    Table<d_t, d_t, EdgeFunctionPtrType> &LookupByTarget =
                            JumpFn->lookupByTarget(n);
                    for (const TableCell &SourceValTargetValAndFunction :
                            LookupByTarget.cellSet()) {
                        d_t dPrime = SourceValTargetValAndFunction.getRowKey();
                        d_t d = SourceValTargetValAndFunction.getColumnKey();
                        EdgeFunctionPtrType fPrime = SourceValTargetValAndFunction.getValue();
                        l_t TargetVal = val(SP, dPrime);
                        PHASAR_LOG_LEVEL(DEBUG,"SP " << IDEProblem.NtoString(SP)<<" dprime: " <<IDEProblem.DtoString(dPrime) <<"  n: " << IDEProblem.NtoString(n) << fPrime->str() <<"  Target val: " << IDEProblem.LtoString(TargetVal));
                        setVal(n, d,
                               IDEProblem.join(val(n, d),
                                               fPrime->computeTarget(std::move(TargetVal))));
                        INC_COUNTER("Value Computation", 1, PAMM_SEVERITY_LEVEL::Full);
                    }
                }
            }
        }

The lmplementation of getStartPointsOf in for (n_t SP : ICF->getStartPointsOf(ICF->getFunctionOf(n))) is:

std::set<const llvm::Instruction *>
LLVMBasedCFG::getStartPointsOf(const llvm::Function *Fun) const {
  if (!Fun) {
    return {};
  }
  if (!Fun->isDeclaration()) {
    const auto *EntryInst = &Fun->front().front();
    if (IgnoreDbgInstructions && llvm::isa<llvm::DbgInfoIntrinsic>(EntryInst)) {
      return {EntryInst->getNextNonDebugInstruction(
          false /*Only debug instructions*/)};
    }
    return {EntryInst};
  }
  PHASAR_LOG_LEVEL(DEBUG, "Could not get starting points of '"
                              << Fun->getName()
                              << "' because it is a declaration");
  return {};
}

This funciton will return the first no debug entry instruciton.

Which mean, when a taint seed is the parament of a function and the entry instructon of that function is a debug instruction, the result table will record it as bottom however the valueComputationTask will calculate from the first no debug instruction. Will casue the problem illustrate in the beginning.

fabianbs96 commented 1 year ago

Hi @Luweicai, thank you for reporting this issue in such a detail. You are right: This is indeed a bug.

635 should fix this