joernio / joern

Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
https://joern.io/
Apache License 2.0
2.03k stars 271 forks source link

[Bug] [jimple2cpg] Branches can cause incorrect CFGs, leading to methods such as `reachableBy` failing #3961

Open Oshawk opened 9 months ago

Oshawk commented 9 months ago

Summary

With jimple2cpg, branches can cause incorrect edges in the control flow graph. This in turn can cause data flow methods such as reachableBy to break.

Example

Example.java:

public class Example {
    public static void main(String[] args) {
        String example = System.getenv("EXAMPLE");

        if (example == null) {
            return;
        }

        String exampleUpper = example.toUpperCase();
        System.out.println(exampleUpper);
    }
}

Compiling:

javac Example.java

Processing with Joern:

joern> importCode("Example.class")
joern> cpg.call.name("println").reachableBy(cpg.call.name("getenv")).l
val res0: List[io.shiftleft.codepropertygraph.generated.nodes.Call] = List()

Clearly there is a path from getenv -> println, but Joern does not pick up on it.

Cause

Looking at the control flow graph. we begin to see the problem:

image

ReachingDefFlowGraph uses Method.reversePostOrder:

/** List of CFG nodes in reverse post order
    */
  def reversePostOrder: Iterator[CfgNode] = {
    def expand(x: CfgNode) = { x.cfgNext.iterator }
    NodeOrdering.reverseNodeList(NodeOrdering.postOrderNumbering(method, expand).toList).iterator
  }

To determine which nodes to process. The (toUpperCase,l1.toUpperCase()) node is being skipped because of this, causing reachableBy to break.

Cause of Cause

So why is the control flow graph incorrect?

In Jimple, this is a unit:

l2 = virtualinvoke l1.<java.lang.String: java.lang.String toUpperCase()>();

Which jumple2cpg converts to the following AST:

In AstForMethodsCreator, this code links the root of this AST with the root of the if statement AST:

      // Join all targets with CFG edges - this seems to work from what is seen on DotFiles
      bodyStatementsInfo.targets.foreach { case (asts, unit) =>
        asts.headOption match {
          case Some(value) =>
            diffGraph.addEdge(value.root.get, bodyStatementsInfo.unitToAsts(unit).last.root.get, EdgeTypes.CFG)
          case None =>
        }
      }

The problem arises when the AST is converted into a CFG, which creates a graph like:

So now the link created from the if statement skips out all but the assignment.

DavidBakerEffendi commented 9 months ago

Interesting, we recently did a fix for something similar in try-catch-finally blocks. Will eventually get around to this