parcel-bundler / parcel

The zero configuration build tool for the web. 📦🚀
https://parceljs.org
MIT License
43.27k stars 2.26k forks source link

Alsh added multiple Nodejs workers and workers RPC #9787

Closed alshdavid closed 2 weeks ago

alshdavid commented 3 weeks ago
devongovett commented 3 weeks ago

tbh it seems like an open question to me on whether we even need multiple JS workers for the first version. If we do most work in Rust and JS plugins are fairly rare, we could just run JS plugins on the main thread and not even bother with workers. It does add a bit of complexity, and until we have an end-to-end build running, we won't know if it is worth it performance wise. Might be good to start simple and measure first to see if we need it?

alshdavid commented 3 weeks ago

tbh it seems like an open question to me on whether we even need multiple JS workers for the first version. If we do most work in Rust and JS plugins are fairly rare, we could just run JS plugins on the main thread and not even bother with workers.

Haha fair point, we initially didn't think we needed them but realized it would be better to ensure the main thread is never blocked on plugin work because that can lead to shutdown issues. Additionally, workers will have some state so it's a good idea to sandbox plugins to a single build invocation to avoid leaking state between multiple builds that are run programmatically - workers are the easiest way to achieve that.

e.g.

const parcel = new Parcel()

await parcel.build()
await parcel.build() // avoid sharing worker state between these two builds

So with that in mind, we think that we will always need at least 1 worker per build. Supporting multiple workers is really just a freebie of the abstraction I'm introducing and isn't drastically more complex than supporting 1 worker.

It does add a bit of complexity, and until we have an end-to-end build running, we won't know if it is worth it performance wise. Might be good to start simple and measure first to see if we need it?

At the moment Atlassian's use case still heavily relies on lots of JavaScript plugins so I ran a benchmark on a Jira sized project in Mach to determine the performance difference between multiple workers vs running plugins on only the main. I saw a consistent halving of build times with every worker added up until 16 workers (on a 16 core machine).

The addition of a single JS resolver plugin added ~30 seconds to the build, halving with each worker added.

devongovett commented 3 weeks ago

I wonder if there is a way to avoid starting up workers for builds that don't have any JS plugins. They do have a fairly significant startup cost if I remember correctly. Could we start them lazily?

Also does what you said mean that you are restarting the whole worker on every build and not persisting them like we do in v2 today? I wonder how much of a performance impact that has as well.

alshdavid commented 3 weeks ago

They do have a fairly significant startup cost if I remember correctly.

It seems to take ~20ms to launch 8 of them but when they are launched sequentially it takes 100ms

Could we start them lazily?

Values are emitted to them over a per-worker channel that is initialized before the worker starts so they are buffered until the worker is ready - I guess that means we don't actually need to wait for a worker to start before beginning bundling.

Also does what you said mean that you are restarting the whole worker on every build and not persisting them like we do in v2 today? I wonder how much of a performance impact that has as well.

Yes, this is also a quirk of napi and libuv. The workers have a callback used to send events from Rust to the worker to do things like call plugins. That callback cannot be unrefed (otherwise the worker will shut down immediately).

If I start the workers and persist them, Nodejs will assume a long-lived async action is running and the Nodejs process will not shut down. So the workers need to live/die along with their respective command (build, watch etc).

I wonder if there is a way to avoid starting up workers for builds that don't have any JS plugins.

I do this in Mach, before the build starts I look at the plugin config to see if any plugins require the Nodejs RPC layer and only init workers if they are needed for a build. I expect we will do this in Parcel as well

devongovett commented 3 weeks ago

I look at the plugin config to see if any plugins require the Nodejs RPC layer and only init workers if they are needed for a build.

Not sure that will be possible with the default parcel config though. It has lots of plugins defined in it that might not actually be used during a build. For example, the coffee script transformer. If you don't have any .coffee files in your project it'll never run but it is still in the config if you need it.

alshdavid commented 3 weeks ago

Hmm, true. That's a good point.

I designed the RpcHost trait to support lazy init by letting the caller run RpcHost.start(). That could be triggered by the Rpc plugins on the first invocation of their working methods (like RpcTransformer.transform()). That would defer the bootstrapping of the workers until the first time they are actually needed

alshdavid commented 3 weeks ago

I've updated the PR to lazily initialize the workers and removed the threadsafe function channel wrapper from RpcHostNodejs

mattcompiles commented 2 weeks ago

@alshdavid What I'm missing here is what is needed to add a new communication point between Rust -> JS. Like which files do I need to modify? A small breakdown of what you'd need to add to make a new one would be cool.