infinyon / fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
https://www.fluvio.io/
Apache License 2.0
3.86k stars 493 forks source link

[Bug]: Faulty WASM code can kill SPU (happens randomly) #1246

Open avinassh opened 3 years ago

avinassh commented 3 years ago

While testing for #1245, I discovered something (tested locally with the latest master 2c0005d). I had this following smartstream filter:

#[smartstream(filter)]
pub fn filter(record: &Record) -> bool {
    let str_result = std::str::from_utf8(record.value.as_ref());
    let string = match str_result {
        Ok(s) => s,
        _ => return false,
    };

    let mut papa_vector = vec![0u8; 4096];
    loop {
        papa_vector.extend(vec![0u8; 4096]);
    }
    if *papa_vector.last().unwrap() == 0 {
        return true;
    }
    string.contains('a')
}

When I first ran it, I got this error in the logs, but SPU continued to function:

Jul 15 19:20:47.001 ERROR stream fetch{starting_offset=0 replica=echo-0 sink=14}: fluvio_spu::services::public::stream_fetch: error: IoError(
    Custom {
        kind: Other,
        error: "filter err wasm trap: unreachable\nwasm backtrace:\n    0: 0xa06e - <unknown>!rust_panic\n    1: 0xa061 - <unknown>!std::panicking::rust_panic_with_hook::hbdbceb5cd158bf19\n    2: 0xa99c - <unknown>!std::panicking::begin_panic_handler::{{closure}}::h9995bb2f0de4bb38\n    3: 0xa977 - <unknown>!std::sys_common::backtrace::__rust_end_short_backtrace::hc7608161a467c002\n    4: 0x4c42 - <unknown>!rust_begin_unwind\n    5: 0x388d - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\n    6: 0x38e2 - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\n    7: 0x37b2 - <unknown>!alloc::raw_vec::capacity_overflow::h407e6cd17e2da5b5\n    8: 0x6b57 - <unknown>!alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle::ha79bff7a5563aabc\n    9: 0x98db - <unknown>!filter\nnote: run with `WASMTIME_BACKTRACE_DETAILS=1` environment variable to display more information\n",
    },
)

However, I when I tried for second time, the SPU exited with following:

Jul 15 19:06:44.381 DEBUG stream fetch{starting_offset=0 replica=echo-0 sink=14}:send_back_records{offset=0 stream_id=2}: fluvio_spu::smart_stream::file_batch: fbatch end file_offset=72
Jul 15 19:06:44.381 DEBUG stream fetch{starting_offset=0 replica=echo-0 sink=14}:send_back_records{offset=0 stream_id=2}: fluvio_spu::smart_stream::filter: starting filter processing current_batch_offset=0 current_batch_offset_delta=0 filter_offset_delta=-1 filter_base_offset=-1 filter_records=0
[1]    13038 illegal hardware instruction  RUST_LOG=fluvio=debug cargo run --bin fluvio-run -- spu -i 5001 -p  -v   .

I am not able to reproduce this reliably as this error happens once in a while. Here is what I am doing:

flvd consume echo -B --smart-stream fl_filter.wasm

If I get stream error, I keep this running this again and again. Sometimes it crashes at 2nd attempt, sometimes it crashes at 7-8th. I have noticed memory/CPU usage, but they aren't high either.

nicholastmosher commented 3 years ago

Thanks for the report @avinassh, this is a very interesting bug. I'm looking forward to figuring out why SPU itself crashed, though I expect it is out of memory. I wonder if wasmtime has any way to restrict memory usage...

github-actions[bot] commented 3 years ago

Stale issue message

nacardin commented 2 years ago

Related to https://github.com/infinyon/fluvio/issues/1245

github-actions[bot] commented 2 years ago

Stale issue message