Safe ownership for EventHandler

dherman commented 4 years ago

As @kjvalencik points out in neon-bindings/neon#551 and neon-bindings/neon#552, the memory management model for EventHandler is still not safe: it's susceptible to leaks and races.

The problem is that the we aren't coordinating the overall lifetime of the queue of pending invocations of the event handler (which all run on the main thread); we're only tracking the places in random other threads where invocations are requested. So when the last request is made, we drop the underlying queue even though there may be pending invocations.

Instead, I propose we expose the ownership model into the public API of EventHandler. That is, the user should understand EventHandler as an atomically refcounted handle to a JS event handler callback. The relevant changes to the API would be:

Scheduling an invocation of the event handler consumes self.
In order to schedule multiple invocations of the event handler, you have to explicitly .clone() it.

The types would look something like this:

struct InvocationQueue(...);          // private implementation
impl Drop for InvocationQueue { ... } // private implementation

#[derive(Clone)]
pub struct EventHandler(Arc<InvocationQueue>);

impl EventHandler {

    pub fn new<'a, C: Context<'a>, T: Value>(cx: &C, this: Handle<T>, callback: Handle<JsFunction>) -> Self;

    // CHANGE: consumes self (implementation clones the Arc<InvocationQueue> and sends to main thread)
    pub fn schedule<T, F>(self, arg_cb: F)
        where T: Value,
              F: for<'a> FnOnce(&mut EventContext<'a>) -> JsResult<'a, T>,
              F: Send + 'static;

    // CHANGE: consumes self (implementation clones the Arc<InvocationQueue> and sends to main thread)
    pub fn schedule_with<F>(self, arg_cb: F)
        where F: FnOnce(&mut EventContext, Handle<JsValue>, Handle<JsFunction>) -> NeonResult<()>,
              F: Send + 'static;

}

Notice that the implementation of the schedule* methods would need to clone the Arc<InvocationQueue> and send it to the main thread in order to ensure that it's kept alive until all the invocations have terminated. This prevents the race where the invocation queue is shut down before all the invocations have executed.

This should also make EventHandler virtually leak-proof: the only way to keep it alive indefinitely is to either keep a local computation running infinitely or to keep cloning it and passing it on to other computations infinitely. Otherwise, by default it will be shut down once all cloned instances have either dropped or scheduled and executed their invocations.

This is an example of what would look different in user code. The example from the RFC would just need one more line to explicitly clone the EventHandler on each iteration:

    let mut this = cx.this();
    let cb = cx.argument::<JsFunction>(0)?;
    let handler = EventHandler::new(cb);
    // or:      = EventHandler::bind(this, cb);
    thread::spawn(move || {
        for i in 0..100 {
            // do some work ....
            thread::sleep(Duration::from_millis(40));
            // schedule a call into javascript
            let handler = handler.clone(); // CHANGE: clone the handler before scheduling an invocation
            handler.schedule(move |cx| {
                // successful result to be passed to the event handler
                Ok(cx.number(i))
            }
        }
    });

dherman commented 4 years ago

One question worth asking is whether the name should somehow reflect the "queue" or "Arc" nature of the API. I couldn't think of any ideas that didn't feel clunky, but I'm open to ideas.

JsFunctionArc
Arcf
ArcFunction
FunctionArc
Farc
ArcEventHandler
EventHandlerArc
ArcEventHandlerQueue

These names all stink IMO 😝

dherman commented 4 years ago

That said, maybe "event handler" was too abstract of a name, and we really should consider something closer to the intuition that this is a thread-safe handle for a function.

dherman commented 4 years ago

More thinking-out-loud about naming:

It occurs to me that what I didn't like about the "thread-safe callback" name from the C++ Node plugin ecosystem is that "callback" is a contextual term, but it's used decontextualized. That's what led me to the intuition of "event handler" since it explains what the callback is for. But the other direction would be to emphasize that this is a function handle, and not talk about it being a callback in the name.

I looked at the stdlib and see "sync" as an intuition that's maybe useful here (it's the Rust terminology for "thread-safe"). Some more ideas following that intuition:

neon::sync::SyncFunction
neon::sync::Function
neon::handle::SyncFunction

Actually that middle one is not bad -- it turns out not to create a name conflict since there's no Function trait in Neon, nor is there a common standard Rust library called Function.

Trying it out for feel:

    let mut this = cx.this();

    let callback = neon::sync::Function::new(cx.argument(0)?);

    thread::spawn(move || {
        for i in 0..100 {
            // do some work ....
            thread::sleep(Duration::from_millis(40));
            // schedule a call into javascript
            callback.clone().schedule(move |cx| {
                // successful result to be passed to the event handler
                Ok(cx.number(i))
            }
        }
    });

kjvalencik commented 4 years ago

I really, really like the idea of a neon::sync module. I'm starting to think maybe this should be implemented as two distinct new primitives.

neon::sync::Peristent. Wraps Js Value (generic), can be sent across threads, cloned, and with a context can be deref'd to. Handle
neon::thread::spawn. Schedules a closure to execute on the V8 VM.

The high level threadsafe EventHandlerAPI can be built by combining these two with thetry_catch API.

dherman commented 4 years ago

I like the concept of "persistent", that seems worth playing with.

I think I see what you mean about decoupling primitives, but I'm not sure if I know how to make that work. So I think there's a few key elements to the design in this issue, let me see if I can brainstorm how to generalize it to arbitrary values:

There's a protocol that keeps a persistent handle alive only until it's been dropped or scheduled for work back on the main thread and that work has been completed.
Only the main thread is allowed to actually unpack the contents of the persistent handle; other threads only get to keep it alive.

So maybe the way to do this for general values is that when you schedule a closure to execute on the main thread, it drops the neon::sync::Persistent, passes the context and a Handle to the unwrapped JS value to the closure, and when the closure terminates it drops the underlying v8 persistent handle.

Maybe there's a way to make this a neon::sync::SyncValue trait with an associated type that we could impl for heterogeneous tuples as well, so that you can schedule a closure that takes more than one value at a time. But probably the common case would be a single value and it'd still be convenient to have a method on neon::sync::Persistent that consumes self and does the equivalent of neon::thread::spawn with the single persistent.

The name spawn doesn't seem right to me; this isn't spawning a thread, it's scheduling work on the main thread.

dherman commented 4 years ago

What that might look like in the example:

    let mut this = cx.this();

    let callback = Persistent::new(cx.argument::<JsFunction>(0)?);

    thread::spawn(move || {
        for i in 0..100 {
            // do some work ....
            thread::sleep(Duration::from_millis(40));
            // schedule a call into javascript
            callback.clone().schedule(move |cx, f| {
                let args = vec![cx.number(i)];
                f.call(cx, args)
            });
        }
    });

kjvalencik commented 4 years ago

I was thinking that schedule could be something that is limited to scheduling code on the VM and not directly give access to the persistent. Persistents get deref'd manually; this extends to multiple persistents.

let executor = neon::sync::Executor::new(&mut cx)?;
let this = cx.this().persistent(&mut cx)?;
let callback = cx.argument::<JsFunction>(0)?.persistent(&mut cx)?;

for i in 0..4 {
    let executor = executor.clone();
    let this = this.clone();
    let callback = callback.clone();

    std::thread::spawn(move || {
        executor.schedule(move |mut cx| {
            let this = this.deref(&mut cx)?;
            let callback = callback.deref(&mut cx)?;
            let res = cx.number(i);
            let args = vec![cx.undefined().upcast(), res.upcast()];

            callback.call(&mut cx, this, args);
        });
    });
}

I don't think the Executor::schedule naming is great. It's the wrapper for the N-API threadsafe callback. We could also create a single global executor in the module initialization and store it in a once_cell.

Then the user wouldn't need to manage the lifetime of it and could call neon::sync::schedule directly. Both APIs could co-exist.

This API is more flexible, but, also much more verbose/less ergonomic than the current API being discussed. However, the goal would be for the higher level API to be able to be built using only safe, public primitives.

dherman commented 4 years ago

I had a great call with @kjvalencik today. He helped me understand a few more of the constraints:

Persistent handles in N-API aren't atomically RCed, so we need to use the main thread to safely manage dropping.
The only way we could think of for doing this is to send a persistent back to the main thread and enqueue a callback on the main thread's event queue to drop the RC.
Thinking ahead to Node Workers, it's possible that a Rust thread could end up encountering persistents belonging to multiple distinct JS threads (with distinct event queues).
In order to support the Drop trait, we'll likely need a thread-safe Rust type representing a persistent handle to store which event queue it needs to send back to.
Just sending a persistent handle from one thread to another is better than implicitly downgrading it; the receiving thread can then decide how to consume (drop) the handle.
There are also use cases for persistent handles that never leave the main thread, such as Rust data structures that internally hold onto JS data (like a private struct wrapped in a JS object).

This suggests a design with two different types of persistent, ref-counted handles, analogous to Rust's Rc and Arc: one that's same-thread-only, and one that implements Send and Sync and can be shared across threads. And then we'd need a type that represents the event queue of a specific thread.

For the sake of argument, let's call these, respectively:

neon::handle::Pin<T> ~neon::handle::Persistent<T>~
neon::handle::Persistent<T> ~neon::handle::Shared<T>~
neon::event::EventQueue

Constructing a Pin<T> only needs a Handle<T>, but constructing a Persistent<T> needs both a Handle<T> and an EventQueue.

So the example might look like this:

    let queue = neon::event::EventQueue::from(&mut cx)?;

    let this = cx.this().persistent(queue)?;
    let callback = cx.argument::<JsFunction>(0)?.persistent(queue))?;

    for i in 0..100 {
        let this = this.clone();
        let callback = callback.clone();

        std::thread::spawn(move || {
            // do some work ....
            thread::sleep(Duration::from_millis(40));
            // schedule a call into JavaScript
            queue.enqueue(move |mut cx| {
                let this = this.deref(&mut cx)?;
                let callback = callback.deref(&mut cx)?;
                let res = cx.number(i);
                let args = vec![cx.undefined().upcast(), res.upcast()];
                callback.call(&mut cx, this, args);
            });
        });
    }

dherman commented 4 years ago

We can probably defer the Pin<T> type to a future RFC. It's good to think ahead to it, and any idiomatic consistencies we might want to prepare for, but it's a pretty orthogonal use case: attaching a reference to a GC'ed value to some Rust ownership structure that is otherwise invisible to the GC.

neon-bindings / rfcs

Safe ownership for EventHandler #31