Multi-threading - "Segmentation fault" - on M1 MacBook Pro

tigger1005 commented 1 year ago

Hi, I found a strange error with the rust unicorn engine and multi-threading on M1 MacBooks. I have setup a very small unicorn engine for ARMv8-M processors. The example just do a little loop to simulate a longer running program. For multi threading I use the rayon crate. When I run the program below with the environment variable at line env::set_var("RAYON_NUM_THREADS", "10"); set to "1" everything works as expected. When I remove the line or set the environment variable to a value e.g. "20", the program stopped most of the time with a "segmentation fault". Sometimes it stop with an "invalid command", And also sometime it run till completion.

It looks like, there is some memory leakage or invalid memory handling for the Rust binding.

I used both, the Unicorn version 2.0.1 and also the current "dev" branch. All show the same behavior. On my work x86 PC it works w/o the error.

use rayon::prelude::*;
use std::env;
use unicorn_engine::unicorn_const::{Arch, Mode, Permission};
use unicorn_engine::{RegisterARM, Unicorn};

const STACK_BASE: u64 = 0x80100000;
const CODE_START: u64 = 0x80000000;

fn main() {
    let arm_code: [u8; 14] = [
        0x00, 0xF0, 0x01, 0xF8, 0xFE, 0xE7, 0x80, 0xB5, 0x00, 0x20, 0x00, 0xAF, 0x80, 0xBD,
    ];

    // Set parameter from cli
    env::set_var("RAYON_NUM_THREADS", "20");

    println!("Start threads");

    (0..10000).into_par_iter().for_each(|_| {
        // Setup platform -> ARMv8-m.base
        let mut emu = Unicorn::new(Arch::ARM, Mode::LITTLE_ENDIAN | Mode::MCLASS).unwrap();
        // Setup memory
        emu.mem_map(CODE_START, 0x20000, Permission::ALL).unwrap();
        emu.mem_map(STACK_BASE, 0x1000, Permission::ALL).unwrap();
        // Setup registers
        emu.reg_write(RegisterARM::SP, STACK_BASE + 100).unwrap();
        // Write code to memory area
        emu.mem_write(CODE_START, &arm_code).unwrap();
        // Run
        emu.emu_start(
            CODE_START | 1,
            (CODE_START + arm_code.len() as u64) | 1,
            0,
            2000,
        )
        .unwrap();
    });
    println!("Done")
}

wtdcode commented 1 year ago

Could you attach an lldb to show the actual fault reason?

tigger1005 commented 1 year ago

Yes,

I have added the output of "bt all" to this comment. Here thread 9 is the failing thread with EXC_BAD_ACCESS. Base was the current dev branch. I was not able to find out more, because the thread is not stopped.

Console_1.txt

Looks like something is happening in: ret = tcg_qemu_tb_exec(env, tb_ptr); in cpu.exec.c.

static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
{
    CPUArchState *env = cpu->env_ptr;
    uintptr_t ret;
    TranslationBlock *last_tb;
    int tb_exit;
    uint8_t *tb_ptr = itb->tc.ptr;

    UC_TRACE_START(UC_TRACE_TB_EXEC);
    tb_exec_lock(cpu->uc->tcg_ctx);
    ret = tcg_qemu_tb_exec(env, tb_ptr);
    tb_exec_unlock(cpu->uc->tcg_ctx);

If there is something I can provide, to imporve debugging, please comment.

tigger1005 commented 1 year ago

Hi, I think I have found a possible connection to an already reported bug. In the Issue "Test failures on M1 Mac #1678" with the last remaining bug -> "Test test_x86_nested_emu_start... " It looks very similar to my finding on multi-threading:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x2800001b0)
  * frame #0: 0x00000002800001b0
    frame #1: 0x0000000101cbc528 libunicorn.2.dylib`cpu_tb_exec(cpu=0x0000000108088000, itb=0x0000000280000080) at cpu-exec.c:60:11
    frame #2: 0x0000000101cbbbc0 libunicorn.2.dylib`cpu_loop_exec_tb(cpu=0x0000000108088000, tb=0x0000000280000080, last_tb=0x000000016fdff420, tb_exit=0x000000016fdff41c) at cpu-exec.c:499:11
    frame #3: 0x0000000101cbb410 libunicorn.2.dylib`cpu_exec_x86_64(uc=0x0000000103808200, cpu=0x0000000108088000) at cpu-exec.c:598:13
    frame #4: 0x0000000101c67f00 libunicorn.2.dylib`tcg_cpu_exec(uc=0x0000000103808200) at cpus.c:96:17

and my finding:

* thread #10, stop reason = EXC_BAD_ACCESS (code=1, address=0x44000002c)
  * frame #0: 0x000000044000002c
    frame #1: 0x000000044000002c
    frame #2: 0x00000001004998fc unicorn_1`cpu_tb_exec(cpu=0x00000001400d8000, itb=0x00000002c0006180) at cpu-exec.c:60:11
    frame #3: 0x0000000100498f68 unicorn_1`cpu_loop_exec_tb(cpu=0x00000001400d8000, tb=0x00000002c0006180, last_tb=0x0000000171061210, tb_exit=0x000000017106120c) at cpu-exec.c:502:11
    frame #4: 0x00000001004986dc unicorn_1`cpu_exec_arm(uc=0x000000010700c400, cpu=0x00000001400d8000) at cpu-exec.c:604:13
    frame #5: 0x000000010045cca8 unicorn_1`tcg_cpu_exec(uc=0x000000010700c400) at cpus.c:96:17

The main difference here is that the test is failing on x86 simulation and my one on ARMv8-M simulation. When I look at the addresses, it looks to me, that there might be a little/big endian problem, because the address range 0x44000000 cannot be disassembled, so I think it is not mapped at all:

0x44000002c -> 0x2c0000044

But that might be only a strange coincidence.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.

unicorn-engine / unicorn

Multi-threading - "Segmentation fault" - on M1 MacBook Pro #1772