parcel-bundler / parcel

The zero configuration build tool for the web. πŸ“¦πŸš€
https://parceljs.org
MIT License
43.5k stars 2.27k forks source link

Feature: Faster Than Light (FTL) reloads #4143

Closed bergkvist closed 2 years ago

bergkvist commented 4 years ago

Faster Than Light (FTL) reloads

This feature request is inspired by fuse-box v4.

πŸ€” Expected Behavior

The HMR reload can change from 2-3 seconds to 5-10 milliseconds.

😯 Current Behavior

Right now, parcel spends the same amount of time (2-3 seconds) on the HMR update regardless of what type of file is changed.

πŸ’ Possible Solution

Instead of rebundling the entire tree of files for every file change - the 5 most recently worked on modules can be injected into the client directly.

Docs from fuse-box repo: https://github.com/fuse-box/fuse-box/blob/master/docs/cache.md

fuse-box implementation of FTL: https://github.com/fuse-box/fuse-box/blob/master/src/FTL/FasterThanLightReload.ts

πŸ”¦ Context

Imagine changing a stylesheet and seeing the change applied before you are able to blink. On a 60Hz monitor, this essentially means the update might happen as soon as the next monitor refresh!

I believe parcel would deliver a massively improved developer experience if it was able to implement this feature.

Considering fuse-box has managed to do it - it shows that this should be possible.

πŸ’» Examples

React demo with fuse-box v4: https://github.com/fuse-box/react-example

devongovett commented 4 years ago

This is already how it works in Parcel. Only the changed file is recompiled and sent directly to the browser via a websocket without waiting for bundling.

bergkvist commented 4 years ago

@devongovett I just ran some benchmarks. parcel (2.0.0-alpha.3.2) is able to recompile/send an scss file to the client in 167-182ms on my machine. This is pretty good, but still 15-33x slower than fuse-box (4.0.0-next.165) at 5-12ms.

I imagine there might be some room for performance improvements somewhere.

Benchmarks (changing src/style.scss)

Here, I'm only changing the background-color between different colors. I included 3 samples of parcel and fuse-box both run on the same machine.

Example (parcel):

<!-- index.html -->
<html>
  <head></head>
  <body>
    <div id="App"></div>
    <script src="main.js"></script>
  </body>
</html>
// main.js
import ReactDOM from 'react-dom'
import React from 'react'
import './style.scss'

const App = () => (<div>Hello world</div>)

ReactDOM.render(<App />, document.getElementById('App'))
// style.scss
body {
  background-color: green;
}

image

image

image

Example (fuse-box)

<html>
  <head>
    $css
  </head>
  <body>
    <div id="App"></div>
    $bundles
  </body>
</html>
// main.js
import ReactDOM from 'react-dom'
import React from 'react'
import './style.scss'

const App = () => (<div>Hello world</div>)

ReactDOM.render(<App />, document.getElementById('App'))
// style.scss
body {
  background-color: green;
}
// fuse.js
const { fusebox } = require('fuse-box')

fusebox({
  target: 'browser',
  entry: 'src/main.js',
  webIndex: {
    template: 'src/index.html',
  },
  cache: true,
  devServer: true,
}).runDev()

image

image

image

bergkvist commented 4 years ago

To help out with this, here are some of the places in the fuse-box source code related to FTL:

In fuse-box, FTL is not another name for HMR. Rather, it is somewhat of a seperate feature/special case of HMR. (A HMR event, that does not fullfill the criteria for entering FTL mode in fuse-box generally takes around 1 second)

Relevant files/functions:

src/FTL/FasterThanLightReload.ts generateFTLJavaScript(modules), fasterThanLight(props), attachFTL(ctx)

src/core/assemble_context.ts setFTLModule(module), setFTLGeneratedContent(str), getFTLGeneratedContent(), getFTLModules(), flush()

The /__ftl server/injected script

src/dev-server/devServer.ts createExpressApp(ctx, props, extra?) image

src/web-index/webIndex.ts createWebIndex(ctx) image

Other places where FTL is "mentioned":

src/config/config.ts createConfig(props) image

src/fuse-log/FuseBoxLogAdapter.ts fuseHeader(props) image

bergkvist commented 4 years ago

Does parcel always write cache to disk whenever a file changes?

DeMoorJasper commented 4 years ago

@bergkvist yes

bergkvist commented 4 years ago

@DeMoorJasper So perhaps there would be some performance gains if the most recently changed files were just kept in memory, and not persisted to the cache-folder on disk?

Could this be the main reason that fuse-box manages to be an order of magnitude faster?

DeMoorJasper commented 4 years ago

@bergkvist it's pretty hard to do that as parcel is multithreaded and Node does not allow memory sharing between processes/threads.

It's apparently impossible or very hard to achieve, potentially we could hack around it by creating some kind of memorystore in C, that preferably does not require serialisation and keeps everything in memory while persisting to storage on the background.

bergkvist commented 4 years ago

@DeMoorJasper Alright, so the reason for storing the cache to disk in the first place is that the threads can't share memory? (As well as for making subsequent dev-builds faster)

Is there a one-to-one relationships between files and threads? Or are multiple threads used for parsing/processing a single file?

Would it be possible to make let's say the 5 last changed files all be bundled on the same thread? That way, you could perhaps avoid having to share memory between threads?

Yeah, some kind of centralized memory-store in C could be a solution - to avoid the IO-bottleneck of writing cache to disk. Does persisting to storage in the background asynchronously run in to the risk of a race condition? (on rapid subsequent file changes)

bergkvist commented 4 years ago

I just discovered there is a thing called SharedArrayBuffer, which allows for sharing memory in worker threads (but not child_process/cluster). Maybe this could be of use.

DeMoorJasper commented 4 years ago

@bergkvist we only started using the cache for sharing data since parcel 2, in parcel 1 we sent everything over ipc (which is a bit slower)

Transformation of assets is pretty much one-to-one. although on subsequent builds an asset can be transformed in a different worker

SharedArrayBuffer seems pretty interesting although not entirely sure till what extend it works for us

bergkvist commented 4 years ago

@DeMoorJasper If the transformation of assets are one-to-one with threads, then I imagine there isn't going to be any huge gain from using multiple threads for HMR reloads, since you'd usually only change one file at a time. Am I right in assuming this?

Obviously, on the initial run (when the .parcel-cache/ is being created) - multithreading/processing would help a lot with performance.

I remember there being some kind of cache in parcel 1 as well. I think the folder was just called ".cache"? I guess this was not used for sharing anything during a single run, but only for improving subsequent run times?

DeMoorJasper commented 4 years ago

You're correct it was only for improving subsequent build times.

About single threading on hmr reloads, I doubt it would make that big of a difference there's a lot of optimising we can do that would probably have a larger impact than this.

For instance for quicker hmr rebuilds we can potentially send an hmr update before the build has completely finished, as now we wait on buildSuccess while we could actually do it on buildProgress for some updates, although this will cause issues whenever hmr needs to do things like completely reload the page.

Do feel free to experiment with the ArrayBuffer it might speedup builds significantly, although our main issue is serialisation which won't get fixed by it unfortunately

bergkvist commented 4 years ago

Just found this library node-shared-cache (GitHub: https://github.com/kyriosli/node-shared-cache), which seems to implement some kind of binary serialisation for javascript objects that is more efficient/performant than JSON, and which even allows for circular references (in contrast to JSON).

node-shared-cache does not use SharedArrayBuffer, but rather uses C++ to implement a shared memory cache (which means it requires node-gyp for compilation). There are some benchmarks showing how much slower it is than normal JavaScript object access in the GitHub README.

Could you give me an example of when you could send an HMR update before build completion? Does this cause a race condition? (if the build is slower than expected for any reason)

DeMoorJasper commented 4 years ago

@bergkvist awesome, seems interesting, the benchmarks are less impressive than I expected though. A 2x faster serialisation can also be achieved with a more optimised json serialiser.

The HMR example was a bad idea (as I described it'll only work in some cases and would act weird whenever it would need a finished bundle), but what I meant is that once all assets are processed we can send an hmr update with all the new assets, the only thing we're missing is the bundles so it should in theory work for JavaScript and would remove the time required for bundling and packaging, which is a significant portion of build times. It should be relatively safe to implement this, but it would add more unexpected behaviour and complexity to hmr. Seems like an interesting experiment but I doubt we'd be able to stabily detect whether hmr will ask for any finished bundles or not, as the user can ask that in their custom hmr code...

bergkvist commented 4 years ago

@DeMoorJasper Assuming I want to expirement with this, where in the code should I go looking?

Would it be possible to give me a brief overview of the relevant files?

For the last couple of weeks I've been learning about N-API (node-addon-api for C++) and v8 - so I could use this to potentially write a centralized in-memory cache in C++.

Turns out avoiding serialization/copying in memory is difficult, due to how v8 works.

It might be possible to completely avoid JSON-like serialization - but the data would still have to be iterated through and cloned within c++.

To understand how this could be done - I'd need a better overview of how the worker threads are created, and what they are producing.

bergkvist commented 4 years ago

A very minimal example demonstrating global caching of a number using node-addon-api with worker_threads I just created:

// index.js
const { Worker, isMainThread } = require('worker_threads')
const os = require('os')
process.dlopen(
  module, 
  require.resolve('./build/Release/cache.node'),
  os.constants.dlopen.RTLD_NOW
)

if (isMainThread) {
  exports.set(42)
  new Worker('./index.js')
  new Worker('./index.js')
} else {
  console.log(exports.get())
  // 42 is printed out twice!
}

The C++ code below is compiled using node-gyp

// cache.cc
#include <napi.h>

double i = 0;

Napi::Number get(const Napi::CallbackInfo &info) {
  return Napi::Number::New(info.Env(), i);
}

void set(const Napi::CallbackInfo &info) {
  i = info[0].As<Napi::Number>().DoubleValue();
}

Napi::Object bind_exports(Napi::Env env, Napi::Object exports) {
  exports.Set("get", Napi::Function::New(env, get, "get"));
  exports.Set("set", Napi::Function::New(env, set, "set"));
  return exports;
}

NODE_API_MODULE(NODE_GYP_MODULE_NAME, bind_exports)

And I used the following config for node-gyp:

# binding.gyp
{
  "targets": [{
    "target_name": "cache",
    "sources": ["cache.cc"],
    "include_dirs": [
      "<!@(node -p \"require('node-addon-api').include\")",
    ],
    'cflags_cc!': ['-fno-exceptions']
  }]
}
bergkvist commented 4 years ago

Something interesting I noticed when playing around with this. The worker threads seem to spend a lot of time starting up!

If a hot reload triggers the creation of a worker thread, this could very well explain what takes up most of the time.

/* 
 Here, we are using the code from the previous comment.
 The master thread records the timestamp and stores it in global cache.
 Then a thread is started, which reads the time from the global cache 
 and computes time passed. 
*/

const { Worker, isMainThread } = require('worker_threads')
const { performance } = require('perf_hooks')
const os = require('os')
process.dlopen(
  module,
  require.resolve('./build/Release/cache.node'),
  os.constants.dlopen.RTLD_NOW
)

if (isMainThread) {
  exports.set(performance.now())
  new Worker('./index.js')
} else {
  const t0 = exports.get()
  console.log(performance.now() - t0)
  // Anything in the range 50ms - 135ms is logged here
}

Once the worker is already running, communication is significantly faster:

/* 
 The idea here is to wait for the worker thread to start before checking 
 communication delay. This is easily accomplished by computing the time it takes
 the worker to send a message to the master.
*/

const { Worker, isMainThread, parentPort } = require('worker_threads')
const { performance } = require('perf_hooks')
const os = require('os')
process.dlopen(
  module,
  require.resolve('./build/Release/cache.node'),
  os.constants.dlopen.RTLD_NOW
)

if (isMainThread) {
  const worker = new Worker('./index.js')
  worker.on('message', () => {
    const t0 = exports.get()
    console.log(performance.now() - t0)
    // Anything in the range 0.2ms - 0.4ms is logged here
  })
} else {
  exports.set(performance.now())
  parentPort.postMessage('')
}
mischnic commented 4 years ago

If a hot reload triggers the creation of a worker thread, this could very well explain what takes up most of the time.

No, they are all (essentially) started when Parcel starts up itself.


Apart from general performance optimizations, I think emitting assets to the HMR runtime in the browser before the build is finished would help with this. Maybe the BundleGraph contains enough information to determine whether this is safe to do (if an asset would load a bundle that isn't written to disk yet).

DeMoorJasper commented 4 years ago

@mischnic I’ve also mentioned this in this thread but it would be hard to implement safely as we don’t know if hmr will refresh the page, if it does it would either refresh twice or just once and the hmr would have failed

Sent with GitHawk

mmcgahan commented 2 years ago

Similar experience here - large app migrating from old create-react-app setup, making a trivial change in any module results in 6-7sec build (2021 MBP w/ M1 Pro). The HMR output that ultimately gets sent over the websocket looks as expected (small, specific), but the console output spends most of that build time on β ™ Building {{the changed file}}.ts..., and then a couple of seconds on β ™ Packaging index.[hash].js before anything gets sent over the web socket - I would naively expect the built module to be sent over the websocket without waiting for the 'packaging' step?

I generated the profile, and it looks like it spends a lot of time walking the dependency graph, but again I'm not sure if that is expected

Screen Shot
devongovett commented 2 years ago

We are working on a rewrite of our bundling algorithm in #6975 which should help with performance there.