secure-software-engineering / phasar

A LLVM-based static analysis framework.
Other
933 stars 140 forks source link

Is there a backward IFDS analysis example? #657

Closed yuffon closed 11 months ago

yuffon commented 1 year ago

I am writing a backward IFDS analysis. Currently, I meet some compilation error. It is complicated to paste all the information needed here to explain this error. But I guess it is due to some issues on the construction of backward ICFG or CFG. Is there some example of backward IFDS analysis example that I can refer to? Thanks.

fabianbs96 commented 1 year ago

Hi @yuffon, currently we don't have an example regarding backward analysis. Can you give me some details on the error?

yuffon commented 1 year ago

Hi @yuffon, currently we don't have an example regarding backward analysis. Can you give me some details on the error?

The compiler gives the following error:

In file included from /home/yuffon/data/programs/phasar-v2023.8.1-install-debug/include/phasar/PhasarLLVM/ControlFlow/LLVMBasedICFG.h:22:
/home/yuffon/data/programs/phasar-v2023.8.1-install-debug/include/phasar/ControlFlow/ICFGBase.h:73:56: error: no member named 'getCallGraphImpl' in 'psr::LLVMBasedBackwardCFG'
                          std::decay_t<decltype(self().getCallGraphImpl())>>);

In my backward IFDS analysis, I use the following set:

using i_t = LLVMBasedBackwardICFG;//interprocedure control flow
using c_t = LLVMBasedBackwardCFG;//intro procedural contro flow

Is it right? I don't know what is the right choice for setting c_t for a backward analysis, should I use LLVMBasedBackwardCFG or LLVMBasedCFG?

Update:

I think I know where the compiler reports an error. After constructing the backward ICFG, using the following code.

    psr::LLVMBasedICFG I(&DB, psr::CallGraphAnalysisType::OTF, {"main"}, &H, nullptr);
    psr::LLVMBasedBackwardICFG BI(&I);

I use the following

    BI.getCallersOf(initFunc);

The function ICFGBase::getCallersOf() is defined as follows:

  [[nodiscard]] decltype(auto) getCallersOf(ByConstRef<f_t> Fun) const {
    return getCallGraph().getCallersOf(Fun);
  }

Here LLVMBasedBackwardICFG is a subclass of ICFGBase. Here function ICFGBase::getCallGraph() is defined as:

  [[nodiscard]] decltype(auto) getCallGraph() const noexcept {
    static_assert(
        is_crtp_base_of_v<CallGraphBase,
                          std::decay_t<decltype(self().getCallGraphImpl())>>);
    return self().getCallGraphImpl();
  }

But, the class LLVMBasedBackwardICFG does not have a method getCallGraphImpl(). I see that only the forward class LLVMBasedICFG defines the method getCallGraphImpl(). This is why the complier reports an error.

To summarize, the LLVMBasedBackwardICFG::getCallersOf() function will cause an error.

I have also checked the old implementation is a previous version of phasar, the ICFGBase::getCallersOf() is implemented as

  [[nodiscard]] decltype(auto) getCallersOf(ByConstRef<f_t> Fun) const {
    static_assert(
        is_iterable_over_v<decltype(self().getCallersOfImpl(Fun)), n_t>);
    return self().getCallersOfImpl(Fun);
  }

Here both LLVMBasedICFG and LLVMBasedBackwardICFG have their method getCallersOfImpl().

So I think the method LLVMBasedBackwardICFG.getCallersOf() function is not implemented correctly. @fabianbs96

The above error cannot be stepped over because I need to use the backward ICFG to define an IFDS problem solver as follows:

std::shared_ptr<psr::IFDSSolver_P<MyBackwardIFDSProblem>> postSetSolver(
            new psr::IFDSSolver_P<MyBackwardIFDSProblem>(*MyBackwardIFDSProblem::MyBackwardAnalysisInstance, &BI));

The above statement cannot be compiled.

The similar error is also reported at

/home/yuffon/data/programs/phasar-v2023.8.1-install-debug/include/phasar/ControlFlow/ICFGBase.h:100:44: error: no member named 'getReturnSitesOfCallAtImpl' in 'psr::LLVMBasedBackwardCFG'
        is_iterable_over_v<decltype(self().getReturnSitesOfCallAtImpl(Inst)),

I think the similar case happens on function getReturnSitesOfCallAt() as well.

I guess that the backward analysis is not tested after some version.

fabianbs96 commented 1 year ago

Hi @yuffon, I have reproduced your issue. Can you check whether #660 fixes it?

yuffon commented 1 year ago

Hi @fabianbs96, I will check it on LLVM 14 and report results later.

yuffon commented 1 year ago

Hi @fabianbs96 , I have test the backward analysis using branch f-fixBackwardICFG in my project. I met some compilation errors.

  1. Some of them are:
/home/yuffon/data/programs/phasar-v2023.8.1-f-fixBackwardICFG-install-debug/include/phasar/ControlFlow/ICFGBase.h:65:44: error: no member named 'allNonCallStartNodesImpl' in 'psr::LLVMBasedBackwardCFG'
        is_iterable_over_v<decltype(self().allNonCallStartNodesImpl()), n_t>);
/home/yuffon/data/programs/phasar-v2023.8.1-f-fixBackwardICFG-install-debug/include/phasar/ControlFlow/ICFGBase.h:90:44: error: no member named 'getCallsFromWithinImpl' in 'psr::LLVMBasedBackwardCFG'
        is_iterable_over_v<decltype(self().getCallsFromWithinImpl(Fun)), n_t>);
/home/yuffon/data/programs/phasar-v2023.8.1-f-fixBackwardICFG-install-debug/include/phasar/ControlFlow/ICFGBase.h:100:44: error: no member named 'getReturnSitesOfCallAtImpl' in 'psr::LLVMBasedBackwardCFG'
        is_iterable_over_v<decltype(self().getReturnSitesOfCallAtImpl(Inst)),

I see that these functions have been implemented in LLVMBasedBackwardICFG. But my compiler reports that these functions are not implemented in LLVMBasedBackwardCFG (not ICFG). Why?

I notice that

class LLVMBasedICFG : public LLVMBasedCFG, public ICFGBase<LLVMBasedICFG>

and

class LLVMBasedBackwardICFG : public LLVMBasedBackwardCFG, public ICFGBase<LLVMBasedBackwardCFG>

ICFGBase has the following definitions:

template <typename Derived> class ICFGBase

and

const Derived &self() const noexcept {
    return static_cast<const Derived &>(*this);
  }

See the class name in template. Should LLVMBasedBackwardCFG be changed to LLVMBasedBackwardICFG?

I think that is why the compiler reports LLVMBasedBackwardCFG does not implement those function, although the current version of LLVMBasedBackwardICFG has implemented them.

  1. Another one error is:
/home/yuffon/data/programs/phasar-v2023.8.1-f-fixBackwardICFG-install-debug/include/phasar/ControlFlow/ICFGBase.h:73:56: error: no member named 'getCallGraphImpl' in 'psr::LLVMBasedBackwardCFG'
                          std::decay_t<decltype(self().getCallGraphImpl())>>);

getCallGraphImpl is not implemented in LLVMBasedBackwardICFG.

3. By the way, I get the following in dumpResult()

N: store i32 0, i32* %result, align 4, !psr.id !24 | ID: 20
-----------------------------------------------------------
    D: 0x275d0c0 | V: TOP
    D: 0x275b4e0 | V: TOP
    D: 0x25e9b00 | V: TOP

I checked the code, and find that

template <typename T> decltype(auto) printSomehow(const T &Val) 

in Printer.h receives a pointer of Data flow fact. The overrided operator << in my own data flow fact does not work.

fabianbs96 commented 1 year ago

Hi @yuffon, the LLVMBasedBackwardCFG/LLVMBasedBackwardICFG confusion and the missing getCallGraph implementation should be fixed already in #660. Regarding the printing, how did you declare the operator<<? for printSomehow to find the operator<<, it should print into llvm::raw_ostream& (in case you use std::ostream&). Alternatively, you can overload the DToString function to handle your type specially.

yuffon commented 1 year ago

Hi @fabianbs96, I haved tested backward analysis on several small examples in my project. Up to now, Everything is OK to me. I have a small quesiton. In forward analysis, when dumping results, facts associated with each instruction is the data flow facts hold before the instruction. But in backward analysis, it seems that the dumped results facts associated with each isntruction is the facts holding after the instruction. Is that right?

fabianbs96 commented 1 year ago

Hi @yuffon, you are right. This is, because how phasar works: The solver always propagates data flows from a current statement to the successor statements. During this propagation, it applies the flow functions to make the effects of the current statement visible at its successors. For the backwards analysis, the control-flow direction is reversed, so the successors of the current statement are actually its predecessors.

yuffon commented 1 year ago

Hi @yuffon, you are right. This is, because how phasar works: The solver always propagates data flows from a current statement to the successor statements. During this propagation, it applies the flow functions to make the effects of the current statement visible at its successors. For the backwards analysis, the control-flow direction is reversed, so the successors of the current statement are actually its predecessors.

Hi @fabianbs96, thanks. I am using backward analysis in my project. I find that the above setting brings some tricky things when using analysis results.

Take the following piece for example:

blockA:
     inst1
     inst2
    //program point 1

blockB:
     //program point 2
     inst3
    //program point 3
    inst4

I want to get data flow facts before inst3 (program point 2), which is different from program point 1 because blockB may be the target of some branch statement. After backward analysis, I can only get the facts after inst3. In my project, inst3 may change data flow facts. So I can't use program point 3 instead.

How can I get data flow facts at program point 2 after backward analysis?

fabianbs96 commented 1 year ago

Hi @yuffon, we are facing a similar problem as well. For forward analyses, we use the resultsAtInLLVMSSA API as a workaround, but it also does not work for all cases.

Currently, we are discussing about how to fix this problem properly, e.g. introducing dummy statements within the ICFG, or adapting the solver. We will keep you updated.

fabianbs96 commented 1 year ago

Hi @yuffon, can you check whether #669 solves your problem when using the PropagateOntoStrategy?

yuffon commented 11 months ago

Hi @yuffon, can you check whether #669 solves your problem when using the PropagateOntoStrategy?

Just back from vacation. I have tested 'PropagateOntoStrategy' on one simple problem. It contains a branch whose two successors have different facts. It works. Great! I can switch freely between forward/backward and PropagateOver/PropagateOnto now. That is exactly what I need. Thanks @fabianbs96 .