unicorn-engine / unicorn

Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, PowerPC, RiscV, S390x, TriCore, X86)
http://www.unicorn-engine.org
GNU General Public License v2.0
7.59k stars 1.34k forks source link

Weird hang while running neon instructions with pld #1313

Closed sunho closed 3 years ago

sunho commented 4 years ago

Hello! I'm from https://github.com/Vita3K/Vita3K.

We experienced weird hang while running arm thumb2 code. The relevant code was like this.

lsrs r6, r6, #5
lsl.w r8, r6, #4
cmp r6, #0
beq.w #0x344
vst2.8 {d20, d21, d22, d23}, [lr]
vst2.8 {d16, d17, d18, d19}, [sb]
mov lr, r3
mov r0, r1
mov.w ip, #0
add.w sb, lr, #1
vld2.8 {d20, d21, d22, d23}, [lr]
add.w ip, ip, #1
pld [r0, #0x90]
vld2.8 {d16, d17, d18, d19}, [sb]
cmp ip, r6
vshll.i8 q12, d20, #8
add.w sb, r0, #0x10
vshll.i8 q10, d21, #8
add.w lr, lr, #0x20
vmovl.u8 q11, d16
vmovl.u8 q8, d17
vorr q9, q11, q12
vorr q8, q8, q10
vst1.16 {d18, d19}, [r0]
add.w r0, r0, #0x20
vst1.16 {d16, d17}, [sb]

I discovered that if I run this code in simpler setting with code hook enabled, it also hangs. Following code is what I used. I also discovered that if I remove "pld [r0, #0x90]", it can complete even with code hook. (ARM_CODE = CODE_WITHOUT_PLD)

from __future__ import print_function
from unicorn import *
from unicorn.arm_const import *

CODE_WITHOUT_PLD = bytes(bytearray.fromhex("76094FEA0618002E00F0A2814EF90F4349F90F039E4608464FF0000C0EF101096EF90F430CF1010C69F90F03B445F2FF248300F11009F2FF25430EF1200EC8FF306AC8FF310A66EFF82160EFF40140F94F2A00F1200049F94F0A"))
CODE_WITH_PLD = bytes(bytearray.fromhex("76094FEA0618002E00F09C814EF90F4349F90F039E4608464FF0000C0EF101096EF90F430CF1010C90F890F069F90F03B445F2FF248300F11009F2FF25430EF1200EC8FF306AC8FF310A66EFF82160EFF40140F94F2A00F1200049F94F0A"))
ARM_CODE = CODE_WITH_PLD
ADDRESS    = 0x10000

def hook_code(uc, address, size, user_data):
    print(">>> Tracing instruction at 0x%x, instruction size = 0x%x" %(address, size))

def test_arm():
    print("Emulate ARM code")
    try:
        mu = Uc(UC_ARCH_ARM, UC_MODE_THUMB)

        mu.mem_map(ADDRESS, 2 * 0x1024 * 1024)

        mu.mem_write(ADDRESS, ARM_CODE)

        mu.reg_write(UC_ARM_REG_R0, 0x11000)
        mu.reg_write(UC_ARM_REG_R1, 0x12000)
        mu.reg_write(UC_ARM_REG_R3, 0x13000)
        mu.reg_write(UC_ARM_REG_R6, 0x14000)
        mu.reg_write(UC_ARM_REG_R9, 0x15000)
        mu.reg_write(UC_ARM_REG_LR, 0x16000)
        mu.reg_write(UC_ARM_REG_APSR, 0xFFFFFFFF)
        mu.reg_write(UC_ARM_REG_FPEXC, 0x40000000)
        mu.reg_write(UC_ARM_REG_C1_C0_2, mu.reg_read(UC_ARM_REG_C1_C0_2) | (0xf<<20))

        mu.hook_add(UC_HOOK_CODE, hook_code, begin=ADDRESS, end=ADDRESS + 100)

        mu.emu_start(ADDRESS|1, ADDRESS + len(ARM_CODE))

        print(">>> Emulation done. Below is the CPU context")
    except UcError as e:
        print("ERROR: %s" % e)

if __name__ == '__main__':
    test_arm()

If I remove that pld instruction in real settings (Vita3k running game), the full program doesn't hang and keeps going. I think there's something going wrong in unicorn emulating neon instructions.

sunho commented 4 years ago

It generated following tcg.

 ld_i32 env,env,$0xffffffffffffffec
 movi_i32 tmp6,$0x0
 brcond_i32 tmp5,tmp6,ne,$0x0
 movi_i32 tmp5,$0x2
 movi_i32 tmp6,$0x2
 movi_i64 tmp7,$0x16d4c554070
 movi_i64 tmp8,$0xa526a
 call uc_tracecode,$0x0,$0,tmp5,tmp6,tmp7,tmp8
 ld_i32 tmp9,env,$0xffffffffffffffec
 movi_i32 tmp10,$0x0
 brcond_i32 tmp9,tmp10,ne,$0x0
 mov_i32 tmp9,r3
 mov_i32 r14,tmp9
 movi_i32 tmp9,$0x2
 movi_i32 tmp10,$0x2
 movi_i64 tmp11,$0x16d4c554070
 movi_i64 tmp12,$0xa526c
 call uc_tracecode,$0x0,$0,tmp9,tmp10,tmp11,tmp12
 ld_i32 tmp13,env,$0xffffffffffffffec
 movi_i32 tmp14,$0x0
 brcond_i32 tmp13,tmp14,ne,$0x0
 mov_i32 tmp13,r1
 mov_i32 r0,tmp13
 movi_i32 tmp13,$0x4
 movi_i32 tmp14,$0x2
 movi_i64 tmp15,$0x16d4c554070
 movi_i64 tmp16,$0xa526e
 call uc_tracecode,$0x0,$0,tmp13,tmp14,tmp15,tmp16
 ld_i32 tmp17,env,$0xffffffffffffffec
 movi_i32 tmp18,$0x0
 brcond_i32 tmp17,tmp18,ne,$0x0
 movi_i32 tmp17,$0x0
 movi_i32 tmp18,$0x0
 or_i32 tmp18,tmp18,tmp17
 mov_i32 r12,tmp18
 movi_i32 tmp17,$0x4
 movi_i32 tmp18,$0x2
 movi_i64 tmp19,$0x16d4c554070
 movi_i64 tmp20,$0xa5272
 call uc_tracecode,$0x0,$0,tmp17,tmp18,tmp19,tmp20
 ld_i32 tmp21,env,$0xffffffffffffffec
 movi_i32 tmp22,$0x0
 brcond_i32 tmp21,tmp22,ne,$0x0

ld_i32 env,env,$0xffffffffffffffec seems wrong.

aquynh commented 4 years ago

Let me see. How do you dump out the TCG code?

sunho commented 4 years ago

I just breakpointed at tcg_gen_code_common and called dump_ops functions.

wtdcode commented 3 years ago

Probably fixed in uc2.