secure-software-engineering / phasar

A LLVM-based static analysis framework.
Other
919 stars 140 forks source link

Path Tracing #640

Closed fabianbs96 closed 7 months ago

fabianbs96 commented 1 year ago

Path Tracing

As of now, PhASAR does not provide a way of reporting, on which path a dataflow fact was propagated. Such path information may help in better understanding dataflow results and may be used for debugging, reporting and much more.

This PR adds a basic implementation of path tracing by providing a new PathAwareIDESolver that subclasses the existing IDESolver. While solving, the PathAwareIDESolver collects path information on-the-fly and stores them into an ExplodedSuperGraph that can be retrieved after solving via the getExplicitESG() API. Note: The ExplodedSuperGraph may not store all nodes and edges that you may expect. This is for performance reasons as the ESG can grow extremly huge. However, the ESG is only allowed to prune identity flows within one BasicBlock, so no data flow is actually "lost".

Based on the ExplodedSuperGraph you can use the PathSensitivityManager (or a derivative of it) to query path information for any (n_t, d_t) instruction-dataflowfact-pair that the IDESolver previously computed. The PathSensitivityManager then computes a DAG (directed acyclic graph) as induced subgraph of the underlying ExplodedSuperGraph that is reachable from the ESG node representing the (n_t, d_t) query.

The resulting DAG is created in a compressed form that may merge consecutive nodes together to save memory. The merged "supernodes" contain a vector of contained LLVM instructions in reverse control-flow order. For working with the DAG, you may want to take a look at our GraphTraits.h and the used implementation in AdjacencyList.h

Constraint Solving

In addition to the "regular" PathSensitivityManager, we also provide a Z3BasedPathSensitivityManager that uses the Z3 constraint solver to compute a set of paths a dataflow fact can have taken in ordfer to reach the query statement (instead of returning a DAG). This set of paths is filtered multiple times based on if-conditions. If the constraints inferred from such if-conditions are found to be contradictory, the affected paths are sorted out.

Note: Although we perform quite aggressive filtering, the number of paths is still exponential in the size of the original DAG in the worst case. So, the runtime of the Z3BasedPathSensitivityManagerqueries may be significantly longer than queries to the PathSensitivityManager. Further, the number of returned paths (that were not sorted out) may still be very large.

Breaking Changes

The path-tracing is completely opt-in. However, PhASAR now depends on the Z3 solver that is requiredfor compiling the Z3-based constraint solving. If that is not needed, or you don't want to depend on Z3, the Z3 support can be disabled by setting the cmake variable PHASAR_USE_Z3 to OFF. It is ON by default, though.