ARM Thumb LDR instructions not triggering memory hook

rchtsang commented 1 year ago

I will add a minimum working example when I get the chance, but I wanted to open the issue before I forgot.

Unicorn v.2.0.1 on Python 3.10.6 (Mac OSX 12.6 M1 Pro)

I'm trying to emulate code for a Cortex-M4 microcontroller and for various reasons I want to hook every memory access.

I'm doing something like the following:

def _hook_mem(self, uc, access, address, size, value, user_data):
  mem_logger.info("pc @ 0x{:08X} : {} 0x{:<8x} @ 0x{:08X}".format(
      *self.bblock.mem_log[-1]))

  if access in [UC_MEM_WRITE, UC_MEM_WRITE_PROT]:
      mem_state[address] = bytes(uc.mem_read(address, size))
  else:
      mem_reads.add(address)

uc.hook_add(
    (UC_HOOK_MEM_READ | 
    UC_HOOK_MEM_READ_PROT |
    UC_HOOK_MEM_READ_AFTER |
    UC_HOOK_MEM_WRITE |
    UC_HOOK_MEM_WRITE_PROT),
    _hook_mem,
    begin=0x0,
    end=0xFFFFFFFF,
    user_data={},
)

but Unicorn seems to miss some reads that should occur at load instructions.

Here is a log of the behavior I'm seeing:

2022-11-17 21:05:24,769 : 0x2ec: 4b07       ldr r3, [pc, #28]   ; (30c <bsp_board_led_invert+0x20>)
2022-11-17 21:05:24,769 : 0x2ee: f04f 41a0  mov.w   r1, #1342177280 ; 0x50000000
2022-11-17 21:05:24,770 : 0x2f2: 5c18       ldrb    r0, [r3, r0]
2022-11-17 21:05:24,770 : 0x2f4: f8d1 3504  ldr.w   r3, [r1, #1284] ; 0x504
2022-11-17 21:05:24,770 : 0x2f8: 2201       movs    r2, #1

...

2022-11-17 21:05:24,803 : 0x8c4: 6865       ldr r5, [r4, #4]
2022-11-17 21:05:24,804 : pc @ 0x000008C4 : r 0x0        @ 0x00000004
2022-11-17 21:05:24,804 : pc @ 0x000008C4 : r 0x2b1      @ 0x00000004
2022-11-17 21:05:24,805 : 0x8c6: f8d4 7088  ldr.w   r7, [r4, #136]  ; 0x88
2022-11-17 21:05:24,805 : 0x8ca: 1e6e       subs    r6, r5, #1

As you can see, only one of the ldr instructions in that sequence actually triggers the _hook_mem callback.

Again, I'll get my MWE up as soon as I can, but for now, is this more likely a unicorn bug, or a configuration error?

wtdcode commented 1 year ago

Sorry for late reply. It's a limitation of QEMU tcg backend. QEMU will inline memory read/writes on aarch64 platform so we have no chance to hook read/write events, which is a known bug I should document elsewhere...

jdtcd commented 1 year ago

Here's a small example I was playing with before I came across the explanation from @wtdcode

from unicorn import *
from unicorn.arm_const import *

#   LDR     R4, =0x20000000
#   LDR     R5, [R4], #4
#   LDR     R5, [R4], #4
#   LDR     R5, [R4], #4
#   LDR     R5, [R4], #4
THUMB_CODE = b"\x4f\xf0\x00\x54\x54\xf8\x04\x5b\x54\xf8\x04\x5b\x54\xf8\x04\x5b\x54\xf8\x04\x5b"

ROM_START = 0x08000000
ROM_SIZE = 128 * 1024
RAM_START = 0x20000000
RAM_SIZE = 8 * 1024

def hook_read(uc, type, address, size, value, user_data):
    print("READ: addr=%08X, size=%d" %(address, size))

uc = Uc(UC_ARCH_ARM, UC_MODE_THUMB + UC_MODE_MCLASS + UC_MODE_LITTLE_ENDIAN)
uc.ctl_set_cpu_model(UC_CPU_ARM_CORTEX_M3)
uc.mem_map(ROM_START, ROM_SIZE, UC_PROT_EXEC + UC_PROT_READ)
uc.mem_map(RAM_START, RAM_SIZE, UC_PROT_READ + UC_PROT_WRITE)
uc.mem_write(ROM_START, THUMB_CODE)
uc.hook_add(UC_HOOK_MEM_READ, hook_read, begin=RAM_START, end=RAM_START+RAM_SIZE)
uc.emu_start(ROM_START | 1, ROM_START + len(THUMB_CODE))
print("R4: %08x" %uc.reg_read(UC_ARM_REG_R4))

output:

READ: addr=20000000, size=4
R4: 20000010

unicorn-engine / unicorn

ARM Thumb LDR instructions not triggering memory hook #1737