secure-software-engineering / phasar

A LLVM-based static analysis framework.
Other
933 stars 140 forks source link

How to use phasar results in a LLVM pass? #637

Closed william4code closed 1 year ago

william4code commented 1 year ago

Hi, I am trying to write a LLVM pass to implement a module in my project. In the LLVM pass, I need to use the analysis results of phasar. Now I meet some problems.

  1. LLVM have many passes. After each pass, the IR representation may change. So it seems that I need to invoke phasar just in my pass. Otherwise, it is highly likely that phasar and my pass are processing different versions of IR. In the wiki about whole-program-analysis, we should process the source code with wllvm and extract IR for each binary. But in the pass, we cannot do that. My question is how can I invoke phasar in a pass?
  2. How can I align each instruction object in the pass with that in phasar? For example, when I iterates each instruction in the pass, how can I retrieve analysis result from phasar? It seems that the Instruction object in the pass is not the same one used in phasar. There are two Instruction objects for each IR instruction. One is used by the compiler pass, another one is used in phasar.

Anyone can help me?

-William

fabianbs96 commented 1 year ago

Hi @william4code, when using PhASAR in a LLVM pass I recommend you to use a module-pass. In the run function you then get a llvm::Module & as parameter which you can use to instantiate a psr::LLVMProjectIRDB. Phasar will then not attempt to take ownership over the LLVM module and not add preprocessing metadata to the IR. You also may want to turn off auto-globals support in the LLVMBasedICFG as it also modifies the IR.

An example invocation may look like this:

llvm::PreservedAnalyses PsrModulePass::run(Module& M, ModuleAnalysisManager& AM) {
  std::vector<std::string> EntryPoints = {"main"}; // Or whatever entry-points you need

  // -- initialize the helper analyses
  psr::LLVMProjectIRDB IRDB(&M);
  psr::LLVMTypeHierarchy TH(IRDB);
  psr::LLVMAliasSet AS(&IRDB);
  psr::LLVMBasedICFG ICF(&IRDB, psr::CallGraphAnalysisType::OTF, EntryPoints, &TH, &AS, psr::Soundness::Soundy, /*AutoGlobals*/false);

  // -- optional: perform some data-flow analyses
  psr::IDELinearConstantAnalysis LCA(&IRDB, &ICF, EntryPoints);
  auto Results = psr::solveIDEProblem(LCA, ICF);

  // -- do something with the results

  return llvm::PreservedAnalyses::all();
}

Does that answer you question?

william4code commented 1 year ago

Hi @william4code, when using PhASAR in a LLVM pass I recommend you to use a module-pass. In the run function you then get a llvm::Module & as parameter which you can use to instantiate a psr::LLVMProjectIRDB. Phasar will then not attempt to take ownership over the LLVM module and not add preprocessing metadata to the IR. You also may want to turn off auto-globals support in the LLVMBasedICFG as it also modifies the IR.

An example invocation may look like this:

llvm::PreservedAnalyses PsrModulePass::run(Module& M, ModuleAnalysisManager& AM) {
  std::vector<std::string> EntryPoints = {"main"}; // Or whatever entry-points you need

  // -- initialize the helper analyses
  psr::LLVMProjectIRDB IRDB(&M);
  psr::LLVMTypeHierarchy TH(IRDB);
  psr::LLVMAliasSet AS(&IRDB);
  psr::LLVMBasedICFG ICF(&IRDB, psr::CallGraphAnalysisType::OTF, EntryPoints, &TH, &AS, psr::Soundness::Soundy, /*AutoGlobals*/false);

  // -- optional: perform some data-flow analyses
  psr::IDELinearConstantAnalysis LCA(&IRDB, &ICF, EntryPoints);
  auto Results = psr::solveIDEProblem(LCA, ICF);

  // -- do something with the results

  return llvm::PreservedAnalyses::all();
}

Does that answer you question?

Thanks for your reply. It seems that you launch a new dataflow analysis for each module when that module is processed by that pass. In my problem, I have already completed the global interprocedure IFDS analysis before the pass. What I need to do is to instrument some instructions to the program so that the dataflow analysis result can help me to monitor the execution. So it seems not appropriate to launch a new dataflow analysis for each module.

What I can image on how to align the instruction object in pass and that in phasar is couting the sequence number of instruction in the block (I can find the right block using function name and block name). But currently I don't know how to control the compiler such that the pass and phasar meet the same version of IR. Do you have some solutions?

william4code commented 1 year ago

Hi @fabianbs96, I think I can solve it. I can generate the big IR file using wllvm, perform dataflow analysis, and then use my pass on that big module directly. In this way, the pass and phasar will meet the same IR version. Thanks.

fabianbs96 commented 1 year ago

Hi @william4code, nice that you already have a solution! One other way that I could imagine is to run the analysis (and instrumentation) as part of the LTO pipeline (within lld). There you also have access to the whole-module IR

william4code commented 1 year ago

Hi @william4code, nice that you already have a solution! One other way that I could imagine is to run the analysis (and instrumentation) as part of the LTO pipeline (within lld). There you also have access to the whole-module IR

Thanks. You provide another solution.