unicorn-engine / unicorn

Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, PowerPC, RiscV, S390x, TriCore, X86)
http://www.unicorn-engine.org
GNU General Public License v2.0
7.33k stars 1.31k forks source link

UC_HOOK_MEM_READ callback gets executed multiple times incorrectly #1876

Closed codenulls closed 10 months ago

codenulls commented 10 months ago

I'm trying to trace memory operations for x86_64 instructions.

I have the instruction mov ecx, [R8]. If the address in R8 is 0c140005ffe, I get the output:

>>> Memory is being READ at 0x140005ffe, data size = 4
>>> Memory is being READ at 0x140005ffc, data size = 4
>>> Memory is being READ at 0x140006000, data size = 4

This is wrong because there should be only one memory read access at address 0x140005ffe instead of three. If you change the address to something else, like 0x140005fb or 0x140005fc, then the issue is gone. This issue seems to only happen when the address is 0x140005ffd, 0x140005ffe, or 0x140005fff. It has something to do with the fff at the end of the address (first 12 bits in little endian).

Here's the code to reproduce the bug:

# # Sample code for X86 of Unicorn. Nguyen Anh Quynh <aquynh@gmail.com>

from __future__ import print_function
from unicorn import *
from unicorn.x86_const import *
import pickle

# memory address where emulation starts
ADDRESS = 0x1000000

# callback for tracing memory access (READ or WRITE)
def hook_mem_access(uc, access, address, size, value, user_data):
    if access == UC_MEM_WRITE:
        print(">>> Memory is being WRITE at 0x%x, data size = %u, data value = 0x%x" \
                %(address, size, value))
    else:   # READ
        print(">>> Memory is being READ at 0x%x, data size = %u" \
                %(address, size))

def test_x86_64():
    print("Emulate x86_64 code")
    code_x64 = b"\x41\x8B\x08" # mov ecx, [r8]
    try:
        # Initialize emulator in X86-64bit mode
        mu = Uc(UC_ARCH_X86, UC_MODE_64)

        # map 2MB memory for this emulation
        mu.mem_map(ADDRESS, 2 * 1024 * 1024)
        mu.mem_map(0x140000000, 1024 * 500)

        mu.mem_write(ADDRESS, code_x64)
        mu.reg_write(UC_X86_REG_R8, 0x140005ffe)

        # tracing all memory READ & WRITE access
        mu.hook_add(UC_HOOK_MEM_WRITE, hook_mem_access)
        mu.hook_add(UC_HOOK_MEM_READ, hook_mem_access)

        try:
            # emulate machine code in infinite time
            mu.emu_start(ADDRESS, ADDRESS + len(code_x64))
        except UcError as e:
            print("ERROR: %s" % e)
    except UcError as e:
        print("ERROR: %s" % e)

test_x86_64()
wtdcode commented 10 months ago

Make sure you have read this: https://github.com/unicorn-engine/unicorn/wiki/FAQ#memory-hooks-get-called-multiple-times-for-a-single-instruction

codenulls commented 10 months ago

This makes a lot of sense. I'll check for alignment in the hook. Thank you.