Improve Persistent Caching for multiple configurations

slorber commented 3 years ago

Feature request

First, I don't know if all this even makes sense, just wanted to expose my use-case and get some feedback.

I work on the Docusaurus framework, we are migrating to Webpack 5 and are happy to leverage the benefits of the new persistent cache.

This is not a critical feature I absolutely need, I am just wondering if, in addition to the great benefits of this new caching, there's still room for even better improvements.

I thought about asking this to @alexander-akait in our PR (as he proposed to help), but I thought it would be better to discuss this in a dedicated issue.

What is the expected behavior?

Improve incremental build performance when you build multiple Webpack configurations in a row, all of them being quite similar and using some shared code.

What is motivation or use case for adding/changing the behavior?

The Docusaurus build system is using a client (browser) config and a server (SSR) config.

When using the i18n feature, we loop over a list of locales and create one distinct SPA per locale (we'll likely explore module federation later).

The process looks a bit like that:

function compile(configs: Configuration[]): Promise<void> {
  return new Promise((resolve, reject) => {
    const compiler = webpack(configs);
    compiler.run((err, stats) => {
      //
      compiler.close(errClose => {
        // ...
      });
    });
  });
}

const locales = ["en", "fr", "de"];

// sequential (on purpose)
for (locale of locales) {
  await compile([
    getConfig({ server: true, locale }),
    getConfig({ server: false, locale })
  ]);
}

The build time perfs I see for a single locale is already great:

cold cache: 100s
warm cache: 25s

But if I build multiple locales with a cold cache, I end up with 100s * number of locale, so 300s for 3 locales for example. I am wondering if Webpack could not be able to build "en" and then reuse some part of the cache to speed up the build of "fr" and "de"?

We have multiple i18n deployment strategies (subdomain fr.domain.io vs subpath domain.io/fr) By default, each locale has a different output.path (like dist/fr and output.publicPath (like /fr/), but it's also possible to build the site for the subdomain strategy.

My impression is that the current caching system will bail-out of using the cache if any config has changed between 2 runs (config is provided through CLI args, I'm not editing buildDependencies __filename)

That's why by default I need to use a distinct cache name per locale to be able to make the incremental build work:

cache: {
   type: 'filesystem',
   name: `${isServer ? "server" : "client"}-${mode}-${locale}`,
}

I've tried to use the non-default i18n deployment strategy (which use a subdomain deployment, for which path: 'dist' and publicPath: '/' for all locales), and use a shared cache for all locales: name:${isServer ? "server" : "client"}-${mode}`.

Some tests I've done:

clear cache
build --locale en => 100s
build --locale fr => 50s

The second build is significantly faster because the cache created for "en" has been successfully leveraged for "fr". But "en" and "fr" are not strictly equivalent code (as it diverse on translated texts and mdx docs), so quite expectedly, building "fr" after "en" is not as fast as building twice for the same locale:

clear cache
build --locale en => 100s
build --locale en => 25s

Now let's try something else:

clear cache
build --locale en => 100s
build --locale fr => 50s
build --locale en => 50s

What we can see is that the last build is not as fast as if we built twice in "en" in a row. This lets me think that the persistent cache only preserves the cache entries of the very last build.

How should this be implemented in your opinion?

Make it possible to keep entries of older builds in the cache? I'd like to have these perfs:

clear cache
build --locale en => 100s
build --locale fr => 50s
build --locale en => 25s (-25s reduction!)

Make it possible to share a cache across multiple builds using different but slightly similar config (ie for different locales, only output.path, output.publicPath and some i18n.json and myDoc.mdx are different, but much of code remains shared between all locales)

Are you willing to work on this yourself?

I don't feel skilled enough to work on this.

Note: I've seen the build-performance/#multiple-compilations info, but thought this may be stale doc? Should we still use cache-loader for such usecase? Do we need parallel-webpack when webpack can compile multiple configs at once? (also last release was 1 year ago)

Note: our server/client config are quite different, the goal for our usecase is more to share the cache on the "locales axis" rather than the "client/server axis".

alexander-akait commented 3 years ago

Each name name:${isServer ? "server" : "client"}-${mode}-${locale}` create own file for cache.

This lets me think that the persistent cache only preserves the cache entries of the very last build.

Yep.

I think you want to say here about shared cache here between different compilers.

It makes sense. Maybe before we need solve https://github.com/webpack/webpack/issues/10400.

/cc @sokra

sokra commented 3 years ago

In general the persistent cache can only when config is equal. So in general it's not possible to share the cache between different configs. If you still want to do that, we need to carefully validate if your configs are compatible. What are the config differences between two locales. I recently added a system to allow invalidating modules based on external factors. So e. g. you can vary values of the DefinePlugin without invalidating the whole cache. You may need something similar for your locales. Another thing you would need is parallel access to the build cache. Currently it's implemented as 1. read cache, 2. build, 3. store cache. When building all locales in parallel, you probably need a cache service where all builds read and write cache entries from/to. Preferable within a worker_thread, so all builds can be in different worker_threads and access the cache via a common worker_thread.

slorber commented 3 years ago

Thanks.

Apart the output path and publicPath I don't thing there is any significant difference between the configs of 2 locales, but will have to double check.

Oh there's a mistake in my sample code: we build locale SPAs in series, not in parallel, so I guess it might reduce the complexity to make this work.

sokra commented 3 years ago

If that would be the only difference, all build would generate the same result. How does the locale influence the build?

slorber commented 3 years ago

Between 2 locales, the possible differens I think of are:

Webpack config:

output.path
output.publicPath
resolve.roots (not yet implemented)

We have some codegeneration that will likely trigger different import() calls, and the locale is in the import path.

export default {
-  '0042d5e9': () => import(/* webpackChunkName: '0042d5e9' */ "@site/i18n/en/myMarkdownDoc.mdx"), 
+  '0042d5e9': () => import(/* webpackChunkName: '0042d5e9' */ "@site/i18n/fr/myMarkdownDoc.mdx"), 
 }

There are imports where the path is the same but the file content is different per locale

import i18n from "@generated/i18n"

Something probably worth mentioning: we'll eventually implement (not 100% sure, not high priority) a Babel plugin to inline translation strings directly into the React JSX code at build time:

- <Translate>Hello World</Translate>
+ <>Hello World</>
+ <>Bonjour le monde</>

FYI I tried to build 2 locales with a shared cache:

yarn build --locale en => 100s
yarn build --locale fr => 50s

Looking at the fr build in the browser, it looks like it works fine at first glance.

sokra commented 3 years ago

resolve.roots (not yet implemented)

The resolver cache would ignore changes to the base resolving options. But it would work if you put your resolve.roots config into the module.rules. Resolve options specified here will become part of the cache key.

We have some codegeneration

How is this attached with webpack? As loader or babel plugin. Or is that a preprocessing step running before webpack?

Babel plugin

That's the tricky part. You probably want to assign the affected modules to a different cache key, so each locale has its own module cache. It might be difficult to figure out which modules are affected, since we need to know before processing the module (to load it from the right cache).

Maybe you need to assign all application code to a different cache key, but I guess that would be fine since node_modules cache and non-code modules can be still cached.

slorber commented 3 years ago

The code generation happens before webpack kicks in. We generate code that is imported statically by webpack. Afaik we don't have any code generation in loaders.

About the babel part, this is just something I'd like to explore but honestly I'd rather improve build times than provide this micro optimization to the app output. But I'm curious how I could assign a specific module to a different cache key, how can this be configured?

sokra commented 3 years ago

how I could assign a specific module to a different cache key, how can this be configured?

It's not possible right now, but won't be too difficult to add.

You would set cacheName in module.rules

slorber commented 3 years ago

That would be really great if we could do that :)

Being able to assign modules to different caches
Having a way to customize the cache entry expiration

slorber commented 3 years ago

Just seen the maxGenerations options, that might help improve the perfs for my usecase, will test that

webpack-bot commented 3 years ago

This issue had no activity for at least three months.

It's subject to automatic issue closing if there is no activity in the next 15 days.

slorber commented 3 years ago

I still think it would be a useful perf improvement, but nothing hurry on my side, just want to keep this issue open

webpack-bot commented 2 years ago

Issue was closed because of inactivity.

If you think this is still a valid issue, please file a new issue with additional information.

webpack / webpack

Improve Persistent Caching for multiple configurations #13034

Feature request