joernio / joern

Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
https://joern.io/
Apache License 2.0
1.97k stars 267 forks source link

Binary File Decompiler To CPG #4804

Open hac425xxx opened 1 month ago

hac425xxx commented 1 month ago

currently joern support ghidra2cpg, with use ghidra to load binary to cpg.

but it seems only use the assembly instruction, why not use the decompiler infomation?

  Call(
    argumentIndex = -1,
    argumentName = None,
    code = "SUB RSP,0x30",
    columnNumber = None,
    dispatchType = "STATIC_DISPATCH",
    dynamicTypeHintFullName = IndexedSeq(),
    lineNumber = Some(value = 1053724),
    methodFullName = "<operator>.subtraction",
    name = "<operator>.subtraction",
    order = 0,
    possibleTypes = IndexedSeq(),
    signature = "",
    typeFullName = "<empty>"
  ),
  Call(
    argumentIndex = -1,
    argumentName = None,
    code = "MOV dword ptr [RBP + -0x14],EDI",
    columnNumber = None,
    dispatchType = "STATIC_DISPATCH",
    dynamicTypeHintFullName = IndexedSeq(),
    lineNumber = Some(value = 1053728),
    methodFullName = "<operator>.assignment",
    name = "<operator>.assignment",
    order = 0,
    possibleTypes = IndexedSeq(),
    signature = "",
    typeFullName = "<empty>"
  ),
itsacoderepo commented 1 month ago

Hello hac425xx,

This approach is intentional because some people read assembly, searching for patterns and other specific details. While this may not be your use case, it's perfectly fine :)

However, you can use decompiled code as input for c2cpg; there's no reason it shouldn't work, aside from potential bugs or missing information.

hac425xxx commented 1 month ago

Thank you very much, I understand what you mean, if use the decompiled output c file as the input of c2cpg, there may be many errors.

My idea is to build CPG directly using the decompiler's AST, so the result should be more accurate