neon-bindings / neon

Rust bindings for writing safe and fast native Node.js modules.
https://www.neon-bindings.com/
Apache License 2.0
8k stars 283 forks source link

Feature request: Safer SysContext via thread-locals #1002

Open jrose-signal opened 1 year ago

jrose-signal commented 1 year ago

signal-neon-futures allows using the Node event loop as an executor for Rust futures via Channels, which mostly works great. However, if we want to synchronously call back into JavaScript from within an async function, we can't: a Channel is inherently asynchronous, and a Context (correctly) has a scoped lifetime.

We're considering a workaround of putting the raw napi_env in a thread-local and using SysContext. Is that a pattern that's worth exposing in Neon? Something like

fn example(cx: FunctionContext<'_>) -> JsResult<JsUndefined> {
  cx.channel().start_future(async {
    SysContext::with_current_context_scoped(|cx| {
      let global = cx.global();
      let message = cx.string("hi");
      global.call_method_with(cx, "log").arg(message).exec(cx);
    })
  };
  Ok(cx.undefined())
}

impl<F> Wake for FutureTask<F>
where
  F: Future<Output = ()> + 'static + Send,
{
  fn wake(self: Arc<Self>) {
    let channel = self.channel.clone();
    channel.send(move |cx| {
      cx.expose_to_current_thread(|| self.poll());
      Ok(())
    });
  }
}

One thing I'm not sure of, though, is whether this breaks under re-entrancy. That is, if JS function J1 calls Rust function R1, which saves its Context and then calls Rust function R2, which calls JS function J2, which calls Rust function R3, which looks up the thread-local context…does N-API freak out? Or do undefined things? The N-API docs say

Specifically, the same napi_env that was passed in when the initial native function was called must be passed to any subsequent nested Node-API calls.

But that doesn't really clarify nesting.

jrose-signal commented 1 year ago

(As an aside, arguably even SysContext::from_raw should have a scoped variant; otherwise it's too easy to create handles with lifetimes longer than you meant them to.)

kjvalencik commented 1 year ago

If I understand correctly, since the async code is only ever polled from the event loop (via channel), it's guaranteed that after any given await point, we are back on the main thread?

If this pattern is sound, I would absolutely love to include it in Neon. This is something I wanted to include as part of Neon's async story, but couldn't quite figure out how to make it safe (I was trying to see if I could make a let mut cx = get_cx().await like API work).

Putting napi_env in a thread-local is interesting because it's basically recreating v8::Isolate::GetCurrent() which is generally discouraged in Node. I need to do some reading. Deno might be a good resource.

RE: Re-entrancy. napi_env holds some state and shouldn't be swapped out. I think this is okay as long as the callback from channel.send is the only thing that assigns the thread local. This way re-entrancy wouldn't replace it--it shouldn't be possible to execute a channel.send while executing (this would cause other unsoundness in Neon).


Aside

signal-neon-futures is really interesting. Can you help me understand why you might want to use Channel to create an executor instead of using the libuv event loop directly?

channel.send(..).await provides a similar API to SysContext::with_current_context_scoped.


Good call on SysContext scoping. I'll make sure that gets added.

jrose-signal commented 1 year ago

If I understand correctly, since the async code is only ever polled from the event loop (via channel), it's guaranteed that after any given await point, we are back on the main thread?

As long as the future doesn't escape to get polled from somewhere else, yes. (Which I think FutureTask prevents, by owning the Future and only polling it with itself as the waker.)

In theory this allows non-Send Futures as long as they are started from a Context rather than a Channel, and in fact I had that in a previous version, but we didn't actually use that in any way, so I took it out to simplify the code.


Putting napi_env in a thread-local is interesting because it's basically recreating v8::Isolate::GetCurrent() which is generally discouraged in Node. I need to do some reading. Deno might be a good resource.

The fact that they said "hey, this is the pattern to use" suggests that it's not inherently a problem in Deno, but that doesn't mean it's not specifically a problem for Node. (I'm out of my depth here, I don't know the history at all. But I don't want to depend on something that might turn out to be unsafe.)

RE: Re-entrancy. napi_env holds some state and shouldn't be swapped out. I think this is okay as long as the callback from channel.send is the only thing that assigns the thread local. This way re-entrancy wouldn't replace it--it shouldn't be possible to execute a channel.send while executing (this would cause other unsoundness in Neon).

The particular concern is

channel.send(|cx| {
  cx.expose_to_current_thread(|| {
    SysContext::with_current_context_scoped(|cx| {
      let global = cx.global();
      global.call_method_with(cx, "exposedToJS").exec(cx);
    }
  })
}

fn exposedToJS(cx: FunctionContext<'cx>) -> JsResult<JsUndefined> {
  // ignore the cx we were given and…
  SysContext::with_current_context_scoped(|cx| {
    let global = cx.global(); // oops!
  }
}

But in writing all this out I realized that this can be avoided by having SysContext::with_current_context_scoped claim the current context (either take it out and put it back on return, or mark it as in-use like RefCell does). So I think that's not going to be a problem!


channel.send(..).await provides a similar API to SysContext::with_current_context_scoped.

Only in an async context! The point of SysContext::with_current_context_scoped is to be able to implement a non-async trait method that can call into Node…when the trait itself is used in an async function. Which only makes sense if you know the Future is being polled on a JavaScript thread. Which is the answer to your other question…

Can you help me understand why you might want to use Channel to create an executor instead of using the libuv event loop directly?

If we always use Channel::send to talk to JS asynchronously, then yeah, it shouldn't matter. But until recently Neon didn't expose the raw napi_env (or the libuv event loop handle directly), so it wasn't a convenient option regardless.

dherman commented 12 months ago

This is an interesting use case, and I agree that the scoped closure variant of SysContext::from_raw makes sense.

My concern about using TLS would be that everyone would have to pay the runtime cost of the Neon runtime continuously updating the TLS storage, even if they're not making use of the with_current_context_scoped feature.

jrose-signal commented 12 months ago

Yeah, I considered having that be implicit but came to the same conclusion. That's what expose_to_current_thread is for, so that it's pay-for-what-you-use only.

kjvalencik commented 12 months ago

@jrose-signal Thanks for the really awesome advice to add a scoped version of SysContext::from_raw. That's really good advice to make the API less unsafe.

As far as adding the expose_to_current_thread / with_current_context_scoped pair, I don't think these are a good fit for Neon at this time. The feature appears niche and I'm hesitant to add it when it appears to have been deliberately not included in Node-API. IIUC, Node has moved towards avoiding GetCurrentContext internally.

I am interested in potentially having Neon provide a futures executor, but I would probably favor one polled by the libuv threadpool over a single threaded executor on the JS main thread.


@dherman and I took a look at signal-neon-futures and your proposed API. It appears safe provided the enforcements discussed previously in this thread (e.g., that napi_env is removed from the thread local).

One thing I might recommend in the signal-neon-futures implementation is to loop in the future polling to get more work done per tick.

jrose-signal commented 12 months ago

One thing I might recommend in the signal-neon-futures implementation is to loop in the future polling to get more work done per tick.

I don't think this actually does anything! Either the future is polled to completion, or it's blocked. (Notice that if you submit multiple top-level Futures to run on the same Channel, they don't know anything about each other, so we can't work on independent futures in the same microtask. But that's probably a good thing for starvation reasons anyway.)