radareorg / radare2

UNIX-like reverse engineering framework and command-line toolset
https://www.radare.org/
GNU Lesser General Public License v3.0
20.58k stars 3k forks source link

GML graph export from ESIL dataflow graphs (aeg) is broken #16785

Open arnaugamez opened 4 years ago

arnaugamez commented 4 years ago

EDIT: Please see https://github.com/radareorg/radare2/issues/16785#issuecomment-625773483

Work environment

Questions Answers
OS/arch/bits (mandatory) Ubuntu 19.10 x64
r2 -v full output, not truncated (mandatory) radare2 4.5.0-git 24779 @ linux-x86-64 git.4.4.0-141-gbeb2f2dd9 commit: beb2f2dd948543c0ed82d5d08d5e3fcc5cbec6da build: 2020-05-07__18:04:26

Expected behavior

Creating a dataflow graph with aeg and exporting into gml results in a valid graph that can be consumed later and containing all necessary information.

Actual behavior

Exporting to gml is broken. Probably also to other formats. I assume the internal representation is just not well defined for those graphs. Labels do not contain the contents of blocks in the graph (as can be explored with aggv). id's and label's included are also confusing. edges do not link correct nodes. They even link non-existing ones.

Trying to consume it with a third party (I am trying with python's networkx) fails to parse because of unmatched nodes in the edge's description. Even if it were consumed by change, all the body information is missing and the edges are linking probably incorrect nodes anyway.

Steps to reproduce the behavior

I am using the same code used by @condret on past r2con2019 talk to generate the dataflow graph. You can generate it with:

.aeg 1024,4,esp,-,=[4],4,esp,-=,0xc,esp,+,[4],4,esp,-,=[4],4,esp,-=,0xc,esp,+,[4],4,esp,-,=[4],4,esp,-=,al,eax,+=[1],7,$o,of,:=,7,$s,sf,:=,$z,zf,:=,7,$c,cf,:=,$p,pf,:=

I thought it could be the general creation of gml output, but I tried generating random graphs (call graphs, for example) for sample binaries like /bin/ls, which are in fact way bigger, and it exports with no problem and is parsed well.

radare commented 4 years ago

Use yfiles to load those graphMl files. There are several standards. Just to confirm

On 7 May 2020, at 18:22, Arnau notifications@github.com wrote:

 Work environment

Questions Answers OS/arch/bits (mandatory) Ubuntu 19.10 x64 r2 -v full output, not truncated (mandatory) radare2 4.5.0-git 24779 @ linux-x86-64 git.4.4.0-141-gbeb2f2dd9 commit: beb2f2d build: 2020-05-07__18:04:26 Expected behavior

Creating a dataflow graph with aeg and exporting into gml results in a valid graph that can be consumed later and containing all necessary information.

Actual behavior

Exporting to gml is broken. Probably also to other formats. I assume the internal representation is just not well defined for those graphs. Labels do not contain the contents of blocks in the graph (as can be explored with aggv). id's and label's included are also confusing. edges do not link correct nodes. They even link non-existing ones.

Trying to consume it with a third party (I am trying with python's networkx) fails to parse because of unmatched nodes in the edge's description. Even if it were consumed by change, all the body information is missing and the edges are linking probably incorrect nodes anyway.

Steps to reproduce the behavior

I am using the same code used by @condret on past r2con2019 talk to generate the dataflow graph. You can generate it with:

.aeg rbp,8,rsp,-,=[8],8,rsp,-=,rsp,rbp,=,edi,0x4,rbp,-,=[4],esi,0x8,rbp,-,=[4],edx,0xc,rbp,-,=[4],0x4,rbp,-,[4],rdx,=,0x8,rbp,-,[4],rax,=,edx,eax,+=,31,$o,of,:=,31,$s,sf,:=,$z,zf,:=,31,$c,cf,:=,$p,pf,:=,0xc,rbp,-,[4],eax,*=,rsp,[8],rbp,=,8,rsp,+=,rsp,[8],rip,=,8,rsp,+= I thought it could be the general creation of gml output, but I tried generating random graphs (call graphs, for example) for sample binaries like /bin/ls, which are in fact way bigger, and it exports with no problem and is parsed well.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

arnaugamez commented 4 years ago

Yes it's broken. It even does not include the information of graphs. I paste here an ultrasimple example to illustrate it.

Generate the graph with this ESIL expression: .aeg ebx,eax,+=,ecx,eax,*=

If you visualize it with aggv for example you get this: image

Now just use aggg to see the GML output

:> aggg
graph
[
hierarchic 1
label ""
directed 1
  node [
    id  3
    label  "4"
  ]
  node [
    id  5
    label  "7"
  ]
  node [
    id  4
    label  "6"
  ]
  node [
    id  1
    label  "1"
  ]
  node [
    id  7
    label  "10"
  ]
  node [
    id  11
    label  ""
  ]
  node [
    id  0
    label  "0"
  ]
  node [
    id  2
    label  "3"
  ]
  node [
    id  8
    label  "12"
  ]
  node [
    id  9
    label  "13"
  ]
  node [
    id  6
    label  "9"
  ]
  edge [
    source  3
    target  4
  ]
  edge [
    source  5
    target  8
  ]
  edge [
    source  4
    target  5
  ]
  edge [
    source  1
    target  4
  ]
  edge [
    source  7
    target  10
  ]
  edge [
    source  11
    target  8
  ]
  edge [
    source  0
    target  1
  ]
  edge [
    source  2
    target  3
  ]
  edge [
    source  8
    target  9
  ]
  edge [
    source  6
    target  7
  ]
]

Check for example the following edge:

edge [
    source  7
    target  10
  ]

It targets node 10, but there is no node with id 10, so it will break whatever tool you use to import that GML file (including networkx, but not only).

I think it might be confusing nodes and labels.

Also please note that the interesting information inside the nodes is not included at all in that GML, so even if the nodes/edges relations were not broke, the interesting information is missing.

radare commented 4 years ago

@ret2libc can you fix this issue ?

On 8 May 2020, at 13:34, Arnau notifications@github.com wrote:

 Yes it's broken. It even does not include the informatino of graphs. I paste here a ultrasimple example.

Generate the graph with this ESIL expression: .aeg edi, esi, +, eax, =, edx, eax, *=

If you visualize it with aggv for example you get this:

Now just use aggg to see the GML output

:> aggg graph [ hierarchic 1 label "" directed 1 node [ id 3 label "4" ] node [ id 5 label "7" ] node [ id 4 label "6" ] node [ id 1 label "1" ] node [ id 7 label "10" ] node [ id 11 label "" ] node [ id 0 label "0" ] node [ id 2 label "3" ] node [ id 8 label "12" ] node [ id 9 label "13" ] node [ id 6 label "9" ] edge [ source 3 target 4 ] edge [ source 5 target 8 ] edge [ source 4 target 5 ] edge [ source 1 target 4 ] edge [ source 7 target 10 ] edge [ source 11 target 8 ] edge [ source 0 target 1 ] edge [ source 2 target 3 ] edge [ source 8 target 9 ] edge [ source 6 target 7 ] ] Check for example the following edge:

edge [ source 7 target 10 ] It targets node 10, but there is no node with id 10, so it will break whatever tool you use to import that GML file (including networkx, but not only).

I think it might be confusing nodes and labels.

Also please note that the interesting information inside the nodes is not included at all in that GML, so even if the nodes/edges relations were not broke, the interesting information is missing.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

condret commented 4 years ago

had the same issue with dot-files, which is why I didn't use them in my talk