nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.3k stars 28.96k forks source link

[Bug] Re-imports in ESM are 20-25 times slower than re-requires in CommonJS #52369

Open vitaliylag opened 5 months ago

vitaliylag commented 5 months ago

Version

21.7.2

Platform

Microsoft Windows NT 10.0.22631.0 x64

Subsystem

Windows 11

What steps will reproduce the bug?

To benchmark the performance of imports and requires you can use this script:

  1. Create the new directory in any location and create the "genModules.mjs" file with this content:
import fsp from 'node:fs/promises';

const MODULE_COUNT = 1000,
      TEST_COUNT   = 5;

for (const fastMode of [false, true]) {
    const wrapperName = fastMode ? 'wrapper_fast' : 'wrapper_slow',
          dirBase_esm = fastMode ? 'esm_fast' : 'esm_slow',
          dirBase_cjs = fastMode ? 'cjs_fast' : 'cjs_slow',
          ext_esm     = fastMode ? '.mjs'     : '.js',
          ext_cjs     = fastMode ? '.cjs'     : '.js';

    //1. Generate code of wrappers (these files import main files few times)

    const [wrapperCode_esm, wrapperCode_cjs] = (() => {
        const esm = [],
              cjs = [];

        for (let i = 1; i <= TEST_COUNT; i++) {
            esm.push('globalThis.startTimeMs = performance.now();');
            esm.push(`await import('./${dirBase_esm}/main${i}${ext_esm}');`);
            esm.push('');

            cjs.push('globalThis.startTimeMs = performance.now();');
            cjs.push(`require('./${dirBase_cjs}/main${i}${ext_cjs}');`);
            cjs.push('');
        }

        return [
            esm.join('\n'),
            cjs.join('\n'),
        ];
    })();

    //2. Generate code of main files (these files contain many imports)

    const [mainCode_esm, mainCode_cjs] = (() => {
        const esm  = [],
              cjs  = [],
              vars = [];

        for (let i = 1; i <= MODULE_COUNT; i++) {
            esm.push(`import m${i} from './modules/module${i}${ext_esm}';`);
            cjs.push(`const m${i} = require('./modules/module${i}${ext_cjs}');`);
            vars.push(`m${i}`);
        }

        const lastLine = `\nconsole.log(((performance.now() - globalThis.startTimeMs) / ${MODULE_COUNT}).toFixed(3) + ' ms, sum = ' + (${vars.join('+')}));\n`;
        esm.push(lastLine);
        cjs.push(lastLine);

        return [
            esm.join('\n'),
            cjs.join('\n'),
        ];
    })();

    //3. Create dirs

    for (const dir of [
        `${dirBase_esm}`,
        `${dirBase_esm}/modules`,
        `${dirBase_cjs}`,
        `${dirBase_cjs}/modules`,
    ]) {
        try {
            await fsp.mkdir(dir);
        } catch {}
    }

    //4. Create files

    //Wrappers
    await fsp.writeFile(`${wrapperName}.mjs`, wrapperCode_esm);
    await fsp.writeFile(`${wrapperName}.cjs`, wrapperCode_cjs);

    //package.json
    await fsp.writeFile(`${dirBase_esm}/package.json`, `{"type": "module"}`);
    await fsp.writeFile(`${dirBase_cjs}/package.json`, `{"type": "commonjs"}`);

    //Few main files
    for (let i = 1; i <= TEST_COUNT; i++) {
        await fsp.writeFile(`${dirBase_esm}/main${i}${ext_esm}`, mainCode_esm);
        await fsp.writeFile(`${dirBase_cjs}/main${i}${ext_cjs}`, mainCode_cjs);
    }

    //Many small modules
    for (let i = 1; i <= MODULE_COUNT; i++) {
        await fsp.writeFile(`${dirBase_esm}/modules/module${i}${ext_esm}`, `export default ${i};\n`);
        await fsp.writeFile(`${dirBase_cjs}/modules/module${i}${ext_cjs}`, `module.exports = ${i};\n`);
    }
}
  1. node genModules.mjs - this command will create many small modules and few wrappers so you can benchmark the performance of ESM imports and CJS requires.

  2. Use these commands to benchmark the performance of imports and requires:

    • node wrapper_slow.mjs - .js extension, imports (esm)
    • node wrapper_slow.cjs - .js extension, requires (cjs)
    • node wrapper_fast.mjs - .mjs extension, imports (esm)
    • node wrapper_fast.cjs - .cjs extension, requires (cjs)

Explanation of results:

How often does it reproduce?

Always, even on different system.

What is the expected behavior?

Re-importing the same ES module should be as fast as re-requiring the same CJS module.

What do you see instead?

My results:

As you can see re-importing the same ES module works ~15-25 times slower than re-require the same CJS module.

It really slows the Node.js down. For example, even simple script can include 50 modules, every module ~10 times. So we have 50 0.3 ms = 15 ms for unique includes and 50 9 * 0.1 ms = 45 ms for re-includes. The total time of includes is 60 ms. But it's important to understand that there may be thousands of modules in some cases.

aduh95 commented 5 months ago

I don't know how relevant is it to compare sync and async API, in particular I'm unconvinced by the "It really slows the Node.js down" conclusion, the runtime can do other stuff during the async operation.

vitaliylag commented 5 months ago
  1. Here is main.js code:
    import m1 from './modules/module1.js';
    import m2 from './modules/module2.js';
    import m3 from './modules/module3.js';
    import m4 from './modules/module4.js';
    import m5 from './modules/module5.js';
    import m6 from './modules/module6.js';
    import m7 from './modules/module7.js';
    import m8 from './modules/module8.js';
    import m9 from './modules/module9.js';
    import m10 from './modules/module10.js';
    import m11 from './modules/module11.js';
    import m12 from './modules/module12.js';
    import m13 from './modules/module13.js';
    import m14 from './modules/module14.js';
    import m15 from './modules/module15.js';
    import m16 from './modules/module16.js';
    import m17 from './modules/module17.js';
    import m18 from './modules/module18.js';
    import m19 from './modules/module19.js';
    import m20 from './modules/module20.js';
    ...
    import m1000 from './modules/module1000.js';

This bug report is about caching when importing the same module few times from different modules. We have "main1.js" with these imports and "main2.js". First we import "main1.js", then "main2.js". When we import "main2.js" these imports should be done fast because we already imported these modules before. But it does not happen.

Require() does not have this problem and these repeated 1000 imports are done in about ~4 ms. But "import" requires 100 ms to import these 1000 modules even if we already imported them before. I guess there is some bug with the import's cache.

It really slows the Node.js down

I'm talking about startup time (how fast the first line of the code is executed).

can do other stuff during the async operation

No. If these modules are required we cannot execute any code before needed imports done (even if some operations are done asynchronously in these imports).

async operation

Actually there is one more bug not mentioned in this bug report. These imports may be async but there is no operations executed in parallel. It means the next import cannot be started even partially before the previous import is done. In require() it's not possible because require() is just a function. But in "import" for example we can read few files simultaneously so we can speed up imports. But in the test we can see unique "import" works even a little slower than unique "require".

climba03003 commented 4 months ago

I believe the conclusion in https://github.com/nodejs/node/issues/44186 still true.

aduh95 commented 4 months ago
      for (let i = 1; i <= TEST_COUNT; i++) {
          esm.push('globalThis.startTimeMs = performance.now();');
          esm.push(`await import('./${dirBase_esm}/main${i}${ext_esm}');`);
          esm.push('');

          cjs.push('globalThis.startTimeMs = performance.now();');
          cjs.push(`require('./${dirBase_cjs}/main${i}${ext_cjs}');`);
          cjs.push('');
      }

Here you compare await import() with require. You should use await import() for both wrapper to get a useful time comparison, otherwise you're comparing a sync operation to an async one.

vitaliylag commented 4 months ago

@aduh95 Wrapper code is not so meaningful here, but OK, I'll make new versions of wrappers and post benchmark results here.

vitaliylag commented 4 months ago

@aduh95 Here is the new version of "genModules.mjs":

import fsp from 'node:fs/promises';

const MODULE_COUNT = 1000,
      TEST_COUNT   = 5;

for (const fastMode of [false, true]) {
    const wrapperBase_esm = fastMode ? 'wrapper_fast_esm.mjs' : 'wrapper_slow_esm.mjs',
          wrapperBase_cjs = fastMode ? 'wrapper_fast_cjs.mjs' : 'wrapper_slow_cjs.mjs',
          dirBase_esm     = fastMode ? 'fast_esm'             : 'slow_esm',
          dirBase_cjs     = fastMode ? 'fast_cjs'             : 'slow_cjs',
          moduleExt_esm   = fastMode ? '.mjs'                 : '.js',
          moduleExt_cjs   = fastMode ? '.cjs'                 : '.js';

    //1. Generate code of wrappers (these files import main files few times)

    const genWrapperCode = (dirBase, mainExt) => {
        const lines = [];

        for (let i = 1; i <= TEST_COUNT; i++) {
            lines.push('globalThis.startTimeMs = performance.now();');
            lines.push(`await import('./${dirBase}/main${i}${mainExt}');`);
            lines.push('');
        }

        return lines.join('\n');
    };

    const wrapperCode_esm = genWrapperCode(dirBase_esm, '.mjs'),
          wrapperCode_cjs = genWrapperCode(dirBase_cjs, '.cjs');

    //2. Generate code of main files (these files contain many imports)

    const [mainCode_esm, mainCode_cjs] = (() => {
        const esm  = [],
              cjs  = [],
              vars = [];

        for (let i = 1; i <= MODULE_COUNT; i++) {
            esm.push(`import m${i} from './modules/module${i}${moduleExt_esm}';`);
            cjs.push(`const m${i} = require('./modules/module${i}${moduleExt_cjs}');`);
            vars.push(`m${i}`);
        }

        const lastLine = `\nconsole.log(((performance.now() - globalThis.startTimeMs) / ${MODULE_COUNT}).toFixed(3) + ' ms, sum = ' + (${vars.join('+')}));\n`;
        esm.push(lastLine);
        cjs.push(lastLine);

        return [
            esm.join('\n'),
            cjs.join('\n'),
        ];
    })();

    //3. Create dirs

    for (const dir of [
        `${dirBase_esm}`,
        `${dirBase_esm}/modules`,
        `${dirBase_cjs}`,
        `${dirBase_cjs}/modules`,
    ]) {
        try {
            await fsp.mkdir(dir);
        } catch {}
    }

    //4. Create files

    //Wrappers
    await fsp.writeFile(wrapperBase_esm, wrapperCode_esm);
    await fsp.writeFile(wrapperBase_cjs, wrapperCode_cjs);

    //package.json
    await fsp.writeFile(`${dirBase_esm}/package.json`, `{"type": "module"}`);
    await fsp.writeFile(`${dirBase_cjs}/package.json`, `{"type": "commonjs"}`);

    //Few main files
    for (let i = 1; i <= TEST_COUNT; i++) {
        await fsp.writeFile(`${dirBase_esm}/main${i}.mjs`, mainCode_esm);
        await fsp.writeFile(`${dirBase_cjs}/main${i}.cjs`, mainCode_cjs);
    }

    //Many small modules
    for (let i = 1; i <= MODULE_COUNT; i++) {
        await fsp.writeFile(`${dirBase_esm}/modules/module${i}${moduleExt_esm}`, `export default ${i};\n`);
        await fsp.writeFile(`${dirBase_cjs}/modules/module${i}${moduleExt_cjs}`, `module.exports = ${i};\n`);
    }
}

Results

Results (almost the same):

Copy of generated code (if you don't want to launch genModules.mjs)

globalThis.startTimeMs = performance.now(); await import('./slow_esm/main2.mjs');

globalThis.startTimeMs = performance.now(); await import('./slow_esm/main3.mjs');

globalThis.startTimeMs = performance.now(); await import('./slow_esm/main4.mjs');

globalThis.startTimeMs = performance.now(); await import('./slow_esm/main5.mjs');


* wrapper_slow_cjs.mjs:
```js
globalThis.startTimeMs = performance.now();
await import('./slow_cjs/main1.cjs');

globalThis.startTimeMs = performance.now();
await import('./slow_cjs/main2.cjs');

globalThis.startTimeMs = performance.now();
await import('./slow_cjs/main3.cjs');

globalThis.startTimeMs = performance.now();
await import('./slow_cjs/main4.cjs');

globalThis.startTimeMs = performance.now();
await import('./slow_cjs/main5.cjs');
joyeecheung commented 4 months ago

Something I've noticed in my require(esm) PR was that the current ESM loader (at least, when it's hitting paths for import, not the paths I carved out for require(esm)) is indeed (non?) surprisingly slow. require(esm) is actually about 1.2x faster than import esm when I was testing ~30 high-impact ESM-only packages. https://github.com/nodejs/node/pull/51977#issuecomment-1995209126 (I wasn’t doing repeated loads, and was just testing loading all of them from scratch once to make sure they can be loaded by require(esm) like import esm, and also happened to measure how long each version took to load all the packages for once).

But in "import" for example we can read few files simultaneously so we can speed up imports.

This is already the case, but as long as the ESM loader resolution & linking is implemented in JavaScript (currently the case) this is probably doing more harm than good for performance. For one, async readFile used by the ESM loader is already 2x slower than readFileSync. Then, the parallelism is currently achieved via Promise.all, which adds extra overhead. I suspect that's part of why require(esm) is faster than import esm because it just does readFileSync sequentially and does not do any additional async/promise-full operations - none of that is really worth the overhead as long as the ESM loader is implemented in JS. It might be worth parallelising things if we fully implement resolution and linking of the ESM loader in C++, then we can spawn lighter weight native threads to do this instead of the slow readFile + async function everywhere + Promise.all, though that'll take quite a bit of refactoring so I don't think that will happen anytime soon.

vitaliylag commented 4 months ago

I suspect that's part of why require(esm) is faster than import esm

If we are talking about unique import/require there was almost no difference in speed between them in my tests.

additional async/promise-full operations - none of that is really worth the overhead

Import/require now takes 0.3 ms, which is really a lot. If most of the time is spent reading the file, it makes sense to do it in parallel. But actually I don't think that much time is spent reading the files. So we need to see what takes the most time and then we can say the better way to speed it up.

is actually about 1.2x faster

This bug report is actually about repeated imports of the same file from different modules, not unique imports. In require it's 20 times faster, probably because require uses the cache.

joyeecheung commented 4 months ago

If we are talking about unique import/require there was almost no difference in speed between them in my tests.

I was talking about require(esm), (a new feature implemented in https://github.com/nodejs/node/pull/51977) on real-world ESM packages, not require(cjs) benchmarked here on simple one-liners, though the 20% difference I was seeing says something about the asynchronous paths in the ESM loader.

joyeecheung commented 4 months ago

In require it's 20 times faster, probably because require uses the cache.

They should both hit the cache, not for performance but for correctness (both repeated require and import must return objects that are reference equal to previous results), the difference should primarily come from module request resolution which is currently asynchronous and parallelized via Promise.all for import esm (and I think that’s doing more harm than good for performance). File I/O typically don’t show up in profiles of module loading on a reasonably modern machine, at least it doesn’t for CJS.

vitaliylag commented 4 months ago

They should both hit the cache

If they both hit the cache, import behaves really strangely. Repeated import of module with ".js" extension is 30 ms slower than with ".mjs" extension. This means import's cache usage is limited.

Also if it would be full cached re-import would not take so much time (0.1 ms). So it probably uses the cache to not re-run the same module again but not for other operations.

joyeecheung commented 4 months ago

Yeah I think that's what's happening, all the operations leading up to "oh it's cached" was taking the time in ESM due to the async overhead. Caching doesn't give you much if the resolution is already slowed down by the asynchronicity (notice that there is even a resolve cache, so the main difference is the asynchronicity), require(esm) is faster than import esm because even during the first load when the actual loading takes a bit of time since it's not yet in cache, the async resolution in import esm is already making the whole thing significnatly slower than the synchronous, sequential resolution done by require(esm) - one might imagine that doing things asynchronously or introducing parallelism may speed things up but if done in Node.js JS land, they are just slowing things down. And once the module is cached and resolution performance dominates the profile, the slowness is amplified (the resolution logic of require(esm) isn't too different from import esm, just that everything is done sequentially and synchronously).

vitaliylag commented 4 months ago

I wrote the wrapper for dynamic import with my own cache:

import Path from 'node:path';

const fastImport_cache_singleLevel = Object.create(null),
      fastImport_cache_multiLevel  = Object.create(null);

globalThis.fastImport = async function fastImport(path, parentMeta) {
    if (typeof path !== 'string' || !path)             throw new TypeError('path must be not empty string');
    if (typeof parentMeta !== 'object' || !parentMeta) throw new TypeError('parentMeta must be an object');

    const {dirname} = parentMeta;

    if (path.includes('://') || !dirname) {
        return import(parentMeta.resolve(path));  //Need to implement the cache usage here
    }else
    if (path[0] === '.') {
        path = Path.join(dirname, path);
        path = Path.normalize(path);
        path = 'file://' + path;
        return fastImport_body(path);
    }else{
        const curLevelCache = fastImport_cache_multiLevel[dirname] ?? (fastImport_cache_multiLevel[dirname] = Object.create(null)),
              cachedNS      = curLevelCache[path];

        if (cachedNS)
            return cachedNS;

        return fastImport_body(parentMeta.resolve(path)).then(ns => curLevelCache[path] = ns);
    }
};

function fastImport_body(resolvedPath) {
    return fastImport_cache_singleLevel[resolvedPath]  ??  import(resolvedPath).then(ns => fastImport_cache_singleLevel[resolvedPath] = ns);
}

Results:

This proves the performance of import cache can be improved.

The code that uses fastImport():

vitaliylag commented 4 months ago

@joyeecheung As you can see in the results above (with my own cache) the reason is not asynchronicity but the cache performance itself.

Asynchronicity does slow the import down a little but not 20 times only few times.

joyeecheung commented 4 months ago

As you can see in the results above (with my own cache) the reason is not asynchronicity but the cache performance itself.

Your implementation is exactly replacing the resolution logic with your own synchronous code, by the way. In the ESM loader it wraps all those path.resolve calls in convoluted async functions and tries to return the cached objects asynchronously in parallel, and your implementation doesn't, it just does path.resolve() synchronously before using it as cache keys, and manipulates the cache synchronously (and it's a lot simpler without all the other hoops that the ESM loader jumps through before computing resolvedPath).

vitaliylag commented 4 months ago

So what's the conclusion? Should we remove those "convoluted async functions" from ESM loader?

without all the other hoops that the ESM loader jumps through before computing resolvedPath

Is my code incorrect? If yes is it incorrect only when some custom loader is used or some hooks added or is it always incorrect?

joyeecheung commented 4 months ago

Should we remove those "convoluted async functions" from ESM loader?

I would say the ESM loader should really be refactored to be conditionally synchronous - not just for performance, but also that's basically how ESM is designed to be both in the web specs and in ECMA262, and this conditional synchronicity is currently being further enforced by proposals like https://github.com/tc39/proposal-defer-import-eval Another part is that the ESM loader should also be refactored to be mostly in C++, again not just for performance but it can also improve error printing when it throws by directly printing what's wrong in user code, not some internal JS code that users can't understand.

joyeecheung commented 4 months ago

Is my code incorrect? If yes is it incorrect only when some custom loader is used or some hooks added or is it incorrect always?

It does the job for the happy, fast path. The ESM loader adds a lot more stuff on the resolution path even if they are not entirely necessary for the simplest use case. Ideally the ESM loader should know how to skip the hoops for a fast happy path, in practice it doesn't because of technical debt.

bluwy commented 4 months ago

FWIW on the topic of performance of repeated (dynamic) imports of the same module. I was investigating this and made a repro: https://github.com/bluwy/node-repeated-dynamic-imports

In summary, manually caching the dynamic imports seems to be nearly 6x faster than letting Node cache itself. Presumably from the discussion above, it's because Node also performs path resolution, albeit also cached but perhaps there's overhead in between.

I also made https://github.com/bluwy/rollup-plugin-cache-dynamic-import as an experiment to see how this works generically, and I found while it works well for long-lasting programs, those that are not (have more unique dynamic imports) would not benefit from it and become slightly slower instead (~0.09% slower in a local app).