rizinorg / rizin

UNIX-like reverse engineering framework and command-line toolset.
https://rizin.re
GNU Lesser General Public License v3.0
2.7k stars 361 forks source link

cannot detect datarefs correctly like IDA on arm32 #2845

Open ghost opened 2 years ago

ghost commented 2 years ago

I'm using rizin/Cutter to disassemble an elf on arm32.

one function like this:

0x00010b3c      push    {r4, lr}
0x00010b40      mov     r0, 1
0x00010b44      bl      fcn.0001308c
0x00010b48      ldr     r4, [fcn.00010b78] ; 0x10b78
0x00010b4c      ldr     r3, [0x00010b7c]
0x00010b50      mov     r1, 0
0x00010b54      mov     r0, 1
0x00010b58      str     r3, [r4, 4]
0x00010b5c      bl      fcn.00012fc8
0x00010b60      ldrh    r0, [r0]
0x00010b64      strh    r0, [r4, 2]
0x00010b68      mov     r0, 1
0x00010b6c      bl      fcn.00012fec
0x00010b70      pop     {r4, lr}
0x00010b74      bx      lr
fcn.00010b78 ();
0x00010b78      .dword 0x00026420
0x00010b7c      .dword 0xb6a85f2d

i get datarefs with aflj and find 0x00010b3c datarefs is:

"datarefs": [
    {
      "from": 68424,
      "to": 0x00010b78,
      "type": "DATA"
    },
    {
      "from": 68428,
      "to": 0x00010b7c,
      "type": "DATA"
    }
  ],

we know this is incorrect, I want to get datarefs below (IDA can get this result):

"datarefs": [
    {
      "from": 68424,
      "to": 0x00026420,
      "type": "DATA"
    },
    {
      "from": 68428,
      "to": 0xb6a85f2d,
      "type": "DATA"
    }
  ],

How can I get this result, or can rizin improve this problem? Thanks, rizin project.

ghost commented 2 years ago

arm_sample.zip This is arm sample elf file. You can use 'aap' command to detect 0x00010b3c as a function and see what i said before.

ret2libc commented 2 years ago

What version of rizin are you using?

This is what get with rizin from dev branch.

./build/binrz/rizin/rizin ./arm_sample
 -- The more 'a' you add after 'aa' the more analysis steps are executed.
[0x00008154]> aap
[0x00008154]> s 0x10b3c
[0x00010b3c]> aflj | jq -C '.[] | select(.name == "fcn.00010b3c") | .datarefs'
[
  {
    "from": 68424,
    "to": 68472,
    "type": "DATA"
  },
  {
    "from": 68428,
    "to": 68476,
    "type": "DATA"
  }
]

This seems right, isn't it?

ghost commented 2 years ago

@ret2libc
I use rizin 0.5.0 @ linux-x86-64, maybe it is dev version. Let's see 68424 insturction: 68424 --> 0x00010b48 ldr r4, [fcn.00010b78] 0x00010b78 .dword 0x00026420 and we can see we should get content from 0x00010b78 to r4 instead of 0x10b78 to r4. So i think the datarefs should be (IDA do like this too): from": 68424, "to": 0x26420 instead of: from": 68424, "to": 68472 (0x26420 is .rodata in elf)

Do you think it's right?

ret2libc commented 2 years ago
┌ fcn.00010b3c ();
│           0x00010b3c      push  {r4, lr}
│           0x00010b40      mov   r0, 1
│           0x00010b44      bl    fcn.0001308c                         ;[1]
│           0x00010b48      ldr   r4, [0x00010b78]                     ; [0x10b78:4]=0x26420
│           0x00010b4c      ldr   r3, [0x00010b7c]                     ; [0x10b7c:4]=0xb6a85f2d
│           0x00010b50      mov   r1, 0
│           0x00010b54      mov   r0, 1
│           0x00010b58      str   r3, [r4, 4]
│           0x00010b5c      bl    fcn.00012fc8                         ;[2]
│           0x00010b60      ldrh  r0, [r0]
│           0x00010b64      strh  r0, [r4, 2]
│           0x00010b68      mov   r0, 1
│           0x00010b6c      bl    fcn.00012fec                         ;[3]
│           0x00010b70      pop   {r4, lr}
└           0x00010b74      bx    lr

From this code I see that the instruction at 0x00010b48 references the data at 0x00010b78. This is what Rizin tells you and I think it is right (at least it is not wrong IMO). 0x00010b78 contains the value 0x26420, but inferring that the function references 0x26420 would require us to look at how the value stored in r4 is used... Actually, I think the instructions str r3, [r4, 4] and strh r0, [r4, 2] do not really refer to 0x26420 but to 0x26420+2 and 0x26420+4, do they?