wasmerio / wasmer

🚀 The leading Wasm Runtime supporting WASIX and WASI
https://wasmer.io
MIT License
19.02k stars 813 forks source link

Segfault in wasmer-wasix #4346

Open ViktorWb opened 11 months ago

ViktorWb commented 11 months ago

Describe the bug

I'm seeing a segmentation fault when running wasmer-wasix, and it seems related to running wasmer-wasix inside tokio while also spawning a thread from within the WebAssembly module.

Wasmer version is 4.2.4, and wasmer-wasix is 0.17.0.

Code is here: wasmtest.zip

Steps to reproduce

Here is the code for the WebAssembly module:

// wasm
fn main() {
    std::thread::spawn(|| {
        #[allow(unused)]
        let mut a = 0u64;
        loop {
            a += 1;
        }
    });
    println!("Hello from WASM");
}

It has no dependencies and is compiled using cargo wasix build.

Here is the code that executes that WebAssembly:

// host
#[tokio::main]
async fn main() {
    let wasm_bytes = include_bytes!("../wasm-img/target/wasm32-wasmer-wasi/debug/wasm-img.wasm");

    let mut store = wasmer::Store::default();
    let module = wasmer::Module::new(&store, wasm_bytes).unwrap();

    wasmer_wasix::WasiEnv::builder("launcher")
        .run_with_store(module, &mut store)
        .unwrap();

    println!("Program complete");
}

Expected behavior

I expect wasmer to safely execute an arbitrary WebAssembly module, and without any unsafe blocks in my code, it should not be possible for any segfaults in wasmer or wasmer-wasix.

Actual behavior

The code above segfaults roughly 50% of the time for me. As seen from the output, the segfault happens after the wasm module exited, so it seems the issue is with the spawned thread.

cargo run
Hello from WASM
Program complete
Segmentation fault (core dumped)

Additional context

The issue does not occur when running without an outer tokio runtime, like so:

// host
fn main() {
    let wasm_bytes = include_bytes!("../wasm-img/target/wasm32-wasmer-wasi/debug/wasm-img.wasm");

    let mut store = wasmer::Store::default();
    let module = wasmer::Module::new(&store, wasm_bytes).unwrap();

    wasmer_wasix::WasiEnv::builder("launcher")
        .run_with_store(module, &mut store)
        .unwrap();

    println!("Program complete");
}

The issue also does not occur when not spawning a thread from within the WebAssembly module:

// wasm
fn main() {
    println!("Hello from WASM");
}

The following code segfaults more often (95% of the time), and perhaps gives a better clue as to what is happening:

// host
fn main() {
    let wasm_bytes = include_bytes!("../wasm-img/target/wasm32-wasmer-wasi/debug/wasm-img.wasm");

    let mut store = wasmer::Store::default();
    let module = wasmer::Module::new(&store, wasm_bytes).unwrap();

    wasmer_wasix::WasiEnv::builder("launcher")
        .run_with_store(module, &mut store)
        .unwrap();

    drop(store);
    println!("Waiting");
    tokio::time::sleep(std::time::Duration::from_secs(1)).await;

    println!("Program complete");
}

Output:

cargo run
Hello from WASM
Waiting
Segmentation fault (core dumped)
Michael-F-Bryan commented 11 months ago

This might be happening because a WASIX thread is still running in the background after run_with_store() returns. Then we might be seeing a use-after-free if the store is dropped and the thread tries to access it.

If this is the case, we could fix it by either making run_with_store() join all spawned threads before returning or wrapping the store's internal state in an Arc rather than using raw pointers to refer to something inside the store.

ViktorWb commented 11 months ago

Yes that sounds plausible!

I'm not familiar with how this works currently, but should it not be a requirement for all threads to be stopped once the main function exits or when exit is called? Otherwise an application could potentially indefinitely taking up compute resources.

Maybe both solutions you mention can be implemented, perhaps by returning some sort of join handle from run_with_store?