joernio / joern

Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
https://joern.io/
Apache License 2.0
1.98k stars 269 forks source link

How to extract the data flow graph(DFG) of a function in joern #2283

Open Icyrockton opened 1 year ago

Icyrockton commented 1 year ago

Is the code below correct ?

    val cpg = importCode.c.fromString(
      """
        |int nested(int a) {
        |  int x;
        |  int z = 0x37;
        |  if(a < 10) {
        |    if( a < 5) {
        |      if(a < 2) {
        |        x = a;
        |      }
        |    }
        |  } else x = z;
        |  return x;
        |""".stripMargin)
    val source = cpg.method("nested").parameter
    val sink = cpg.method("nested").methodReturn
    println(sink.reachableByFlows(source).toJsonPretty)
Tobiasfro commented 1 year ago

Try this instead, works for me

val source = cpg.method("nested").parameter.name("a")
val sink = cpg.method("nested").methodReturn.toReturn
sink.reachableByFlows(source).p
Icyrockton commented 1 year ago

when I extract a larger function, it return empty result

val cpg = importCode.c.fromString(
"""
  |FT_Error  tt_cmap14_validate( FT_Byte*      table,
  |                      FT_Validator  valid )
  |  {
  |    FT_Byte*  p             = table + 2;
  |    FT_ULong  length        = TT_NEXT_ULONG( p );
  |    FT_ULong  num_selectors = TT_NEXT_ULONG( p );
  |
  |
  |    if ( length > (FT_ULong)( valid->limit - table ) ||
  |         length < 10 + 11 * num_selectors            )
  |      FT_INVALID_TOO_SHORT;
  |
  |    /* check selectors, they must be in increasing order */
  |    {
  |      /* we start lastVarSel at 1 because a variant selector value of 0
  |       * isn't valid.
  |       */
  |      FT_ULong  n, lastVarSel = 1;
  |
  |
  |      for ( n = 0; n < num_selectors; n++ )
  |      {
  |        FT_ULong  varSel    = TT_NEXT_UINT24( p );
  |        FT_ULong  defOff    = TT_NEXT_ULONG( p );
  |        FT_ULong  nondefOff = TT_NEXT_ULONG( p );
  |
  |
  |        if ( defOff >= length || nondefOff >= length )
  |          FT_INVALID_TOO_SHORT;
  |
  |        if ( varSel < lastVarSel )
  |          FT_INVALID_DATA;
  |
  |        lastVarSel = varSel + 1;
  |
  |        /* check the default table (these glyphs should be reached     */
  |        /* through the normal Unicode cmap, no GIDs, just check order) */
  |        if ( defOff != 0 )
  |        {
  |          FT_Byte*  defp      = table + defOff;
  |          FT_ULong  numRanges = TT_NEXT_ULONG( defp );
  |          FT_ULong  i;
  |          FT_ULong  lastBase  = 0;
  |
  |
  |          if ( defp + numRanges * 4 > valid->limit )
  |            FT_INVALID_TOO_SHORT;
  |
  |          for ( i = 0; i < numRanges; ++i )
  |          {
  |            FT_ULong  base = TT_NEXT_UINT24( defp );
  |            FT_ULong  cnt  = FT_NEXT_BYTE( defp );
  |
  |
  |            if ( base + cnt >= 0x110000UL )              /* end of Unicode */
  |              FT_INVALID_DATA;
  |
  |            if ( base < lastBase )
  |              FT_INVALID_DATA;
  |
  |            lastBase = base + cnt + 1U;
  |          }
  |        }
  |
  |        /* and the non-default table (these glyphs are specified here) */
  |        if ( nondefOff != 0 ) {
  |          FT_Byte*  ndp         = table + nondefOff;
  |          FT_ULong  numMappings = TT_NEXT_ULONG( ndp );
  |          FT_ULong  i, lastUni = 0;
  |
  |
  |          if ( numMappings * 4 > (FT_ULong)( valid->limit - ndp ) )
  |            FT_INVALID_TOO_SHORT;
  |
  |          for ( i = 0; i < numMappings; ++i )
  |          {
  |            FT_ULong  uni = TT_NEXT_UINT24( ndp );
  |            FT_ULong  gid = TT_NEXT_USHORT( ndp );
  |
  |
  |            if ( uni >= 0x110000UL )                     /* end of Unicode */
  |              FT_INVALID_DATA;
  |
  |            if ( uni < lastUni )
  |              FT_INVALID_DATA;
  |
  |            lastUni = uni + 1U;
  |
  |            if ( valid->level >= FT_VALIDATE_TIGHT    &&
  |                 gid >= TT_VALID_GLYPH_COUNT( valid ) )
  |              FT_INVALID_GLYPH_ID;
  |          }
  |        }
  |      }
  |    }
  |
  |    return SFNT_Err_Ok;
  |  }
  |""".stripMargin)
val source = cpg.method("tt_cmap14_validate").parameter
val sink = cpg.method("tt_cmap14_validate").methodReturn.toReturn
println(sink.reachableByFlows(source).toJsonPretty)
# Result: List()

I also noticed that it seems impossible to extract nested data flow.

    val cpg = importCode.c.fromString(
      """
        |void param(int x,int y,int g) {
        |  int a = x;
        |  int b = a;
        |  if ( b < 20 ) {
        |     y = 30;
        |  }
        |  int z = foo(b);
        | }
        |""".stripMargin)

    val source = cpg.method("param").parameter.name("y")
    val sink = cpg.method("param").methodReturn
    val result = sink.reachableByFlows(source)
    println(result.p)

//    List(_________________________________________________________________________
//      | nodeType | tracked | lineNumber | method | file |
//      |========================================================================|
//      | MethodParameterIn | param(int x, int...|
//    2 | param | Test0.c |
//      | MethodReturn | RET |
//    2 | param | Test0.c |
//    )
//    missing `y = 30 `
Tobiasfro commented 1 year ago
  1. SFNT_Err_Ok Has no flow from any method parameter, its value does not get affected by the parameter. The query therefore returns the correct result

  2. The same can be said about y=30 in your second example

itsacoderepo commented 1 year ago

@Icyrockton You have a syntax error in your snippet, there is a closing bracket missing after return x;.

The following queries should give you a flow

importCode.c.fromString("""
int nested(int a) {
  int x;
  int z = 0x37;
  if (a < 10) {
    if (a < 5) {
      if (a < 2) {
        x = a;
      }
    }
  } else {
    x = z;
  }
  return x;
}""")

def source = cpg.method("nested").parameter
def sink = cpg.method("nested").methodReturn
sink.reachableByFlows(source).p

image

Icyrockton commented 1 year ago

but I want all the data flows in the function, how do I extract them and represent them as a graph (DFG)?

ramsey-coding commented 4 months ago

@itsacoderepo were you able solve this issue? Looks like this tool is not well maintained. What tool you are using now for this kind of analysis.