mlua-rs / mlua

High level Lua 5.4/5.3/5.2/5.1 (including LuaJIT) and Roblox Luau bindings to Rust with async/await support
Other
1.59k stars 133 forks source link

Returning an error in a rust function causes a un-unwindable panic #431

Open cheesycod opened 1 month ago

cheesycod commented 1 month ago

I’m trying to run mlua in a tokio spawn local through a threadpool for luau

However, errors in rust function seems to cause un-unwindable panics due to my usage of panic = abort. Removing panic = abort doesn’t solve the issue but rather kills the thread spawned instead of panicking

code can be found at https://github.com/Anti-Raid/splashtail/blob/5b3310ff582e42041218aaf4b3821f3d09b8b12b/core/rust.templating/src/lang_lua/utils.rs#L42

cheesycod commented 1 month ago

@khvzak

After debugging this issue further, I believe I have found an unsoundness issue in mlua.

The following struct if used with a Tokio LocalSet leads to undefined behaviour:

pub struct ArLuaNonSend {
    /// The Lua VM
    pub vm: Rc<Lua>,
    /// The execution state of the Lua VM
    #[allow(dead_code)] // it isnt actually dead code
    pub state: Arc<ArLuaExecutionState>,
}

Notice the pub vm: Rc<Lua>,. This is the heart of the unsafety. When using this in a tokio localset (spawn_local), the end result will either be an un-unwindable panic or some sort of deadlock related to mlua. For example, running the following code is unsafe if you use ArLuaNonSend as written:

/// Executes a Lua script in the Lua VM
    pub async fn exec(
        vm: &Lua,
        template: &str,
        args: Box<dyn erased_serde::Serialize + Send>,
    ) -> Result<serde_json::Value, base_data::Error> {
        let f: LuaFunction = vm
            .load(template)
            .eval_async()
            .await?;

        log::info!("exec (done creating function f)");

        let args = vm
            .to_value(&args)?;

        let v: LuaValue = f
            .call_async(args)
            .await?;

        let v = serde_json::to_value(v)?;

        Ok(v)
}

It looks like bog-standard Rust/Lua code but this is actually unsafe and after repeated executions on my discord bot led to a deadlock.

I have found this issue to be related to the use of std::rc::Rc. If the ArLuaNonSend struct is changed to the below code without this Rc, the function works as normal:

pub struct ArLuaNonSend {
    /// The Lua VM
    pub vm: Lua,
    /// The execution state of the Lua VM
    #[allow(dead_code)] // it isnt actually dead code
    pub state: Arc<ArLuaExecutionState>,
}

While I have not tested this with other rust types such as Box, I still believe that this is a unsoundness issue of some sort in mlua. Perhaps mlua::Lua should be wrapped in a Pin?

khvzak commented 1 month ago

@cheesycod do you have a standalone example please to reproduce the issue? that I can run on my laptop

will either be an un-unwindable panic

is there any specific message from the panic? Luau relies on external unwinding for exception handling so setting panic=abort is not supported atm.

cheesycod commented 1 month ago

@cheesycod do you have a standalone example please to reproduce the issue? that I can run on my laptop

will either be an un-unwindable panic

is there any specific message from the panic? Luau relies on external unwinding for exception handling so setting panic=abort is not supported atm.

  1. The issue occurs with both panic=abort and without it. Without panic=abort, I get strange undefined behaviour such as dangling threads

Not really any standalone example since it is hard to reproduce outside of my discord bot, but well if you can create a discord application, here: https://github.com/Anti-Raid/splashtail/tree/255801d2a9ff7b39824ebd5f7e858891ad8b7e7a

The telltale sign of when this issue occurs is if you see a deadlock when editting a discord message. Then, the following is seen when run through gdb:

[Thread 0x7ffff47ef640 (LWP 391675) exited]
[Thread 0x7ffff49f6640 (LWP 391674) exited]
[Thread 0x7ffff5002640 (LWP 391671) exited]
[Thread 0x7ffff4dfe640 (LWP 391672) exited]
[Thread 0x7ffff5203640 (LWP 391670) exited]
[Thread 0x7ffff540a640 (LWP 391669) exited]
[Thread 0x7ffff560e640 (LWP 391668) exited]
[Thread 0x7ffff4bfa640 (LWP 391673) exited]
[Thread 0x7ffff580f640 (LWP 391667) exited]
[Thread 0x7ffff5a13640 (LWP 391666) exited]
khvzak commented 1 month ago

@cheesycod could you try the latest v0.10.0-beta.1 I just published? It Send + Sync when enable send feature flag and many tricks no longer needed.

I'll try to reproduce the issue in v0.9 version in the meantime.

cheesycod commented 1 month ago

@khvzak this issue isn’t so relevant in 0.10.0 thanks to your lovely change of making it sync (chefs kiss for that) so I no longer have to spawn 100 threads with a custom threadpool implementation and tokio local set

Will test it locally with threadpool and local set once I’m free though