nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.61k stars 29.07k forks source link

Invalidate cache when using import #49442

Open jonathantneal opened 5 years ago

jonathantneal commented 5 years ago

How do I invalidate the cache of import?

I have a function that installs missing modules when an import fails, but the import statement seems to preserve the failure while the script is still running.

import('some-module').catch(
  // this catch will only be reached the first time the script is run because resolveMissingModule will successfully install the module
  () => resolveMissingModule('some-module').then(
    // again, this will only be reached once, but it will fail, because the import seems to have cached the previous failure
    () => import('some-module')
  )
)

The only information I found in regards to import caching was this documentation, which does not tell me where the “separate cache” used by import can be found.

No require.cache

require.cache is not used by import. It has a separate cache. — https://nodejs.org/api/esm.html#esm_no_require_cache

marxangels commented 1 year ago

It seems that using the vm module is the only comfortable choice.

dword-design commented 1 year ago

@marxangels How can I use the vm module to invalidate the cache?

marxangels commented 1 year ago

@marxangels How can I use the vm module to invalidate the cache?

Run all the code in a vm context so the module cache is under your control.

lehni commented 1 year ago

According to @cspotcode in the following discussion here, there are some concerns regarding the use of vm modules for such a scenario as well:

https://github.com/mochajs/mocha/pull/4855#issuecomment-1077818595

marxangels commented 1 year ago

A simple module-level hot-reload for my express web application with less than 200 lines of code.

express-hot-reload.js if someone needs it.

marxangels commented 1 year ago

This comment was marked as off-topic ??? This demo shows how to use the vm module and control your own module cache.

Never thought it was so difficult for coders to communicate...well, let's make it clearer.

It is obviously not difficult for the Nodejs core team to provide such a cache-delete interface. But why not?

You can't guarantee to solve the dependency problem between modules at all levels, and specific control of specific application scenarios is required. Using the vm module is the only choice.

End! Fck! Bye Bye!

aral commented 1 year ago

A simple module-level hot-reload for my express web application with less than 200 lines of code.

express-hot-reload.js if someone needs it.

Thanks for taking the time to share this @marxangels. It is very much on-topic and I, personally, appreciate it.

simlu commented 1 year ago

The amount of time I have spent total on this is just saddening.

If only cache invalidation was exposed - I still don't understand why it isn't :cry:


A simple module-level hot-reload for my express web application with less than 200 lines of code.

express-hot-reload.js if someone needs it.

I don't fully understand how this works, and it gives me errors that I don't understand either

From line await module.evaluate(); I'm getting:

/user/project/node_modules/@hapi/hoek/lib/error.js:23
            Error.captureStackTrace(this, exports.assert);
                  ^

Error: regex must be a RegExp
    at new module.exports (/user/project/node_modules/@hapi/hoek/lib/error.js:23:19)
    at module.exports (/user/project/node_modules/@hapi/hoek/lib/assert.js:21:11)
    at internals.Base.method (/user/project/node_modules/joi/lib/types/string.js:531:17)
    at /user/project/src/resources/formats.js:15:21
    at SourceTextModule.evaluate (node:internal/vm/module:226:23)
    at SrcModule (file:///user/project/test/express-hot-reload.js:91:16)
    at async linker (file:///user/project/test/express-hot-reload.js:146:20)
    at async ModuleWrap.<anonymous> (node:internal/vm/module:315:24)
    at async Promise.all (index 9)
    at async SourceTextModule.<computed> (node:internal/vm/module:333:11)

Are there any resources available that detail how this all works?

GeoffreyBooth commented 1 year ago

Are there any resources available that detail how this all works?

The only way I know of to achieve this currently is cache busting. This article discusses it: https://dev.to/giltayar/mock-all-you-want-supporting-es-modules-in-the-testdouble-js-mocking-library-3gh1

If there’s a way to achieve it via vm, that would be great; it someone wants to open a PR to implement import { unload } from 'module' and/or import { replace } from 'module' (for individual module replacing/reloading) I would be happy to review one. This is absolutely a problem that the Node core devs would love to solve, but we’ve been flummoxed by V8 not providing an easy API for module unloading. If someone can find another way, or add the missing feature to V8, we would greatly appreciate it.

thescientist13 commented 1 year ago

Just wanted to chime in with my own use case(s) here as another advocate for this feature as part of a website builder tool I work on.

Initially I explored this since I want to use dynamic import to support file based routing where a I could export a page component and so needed this kind of cache busting for local development so when I change my code and a live reload is triggered in the browser, I would be able to see my changes with the new content.

Initially I tried the query string technique but for some reason couldn't get it work back then, but as I was rendering Web Components on the server side, and thus had a little shim for some of the DOM / Browser features I needed, I opted for a Worker thread because it felt safer to shield all that away from the rest of the global runtime.

const routeModuleLocation = path.join(pagesDir, routeFilename);
const routeWorkerUrl = `...`;

const html = await new Promise((resolve, reject) => {
  const worker = new Worker(routeWorkerUrl);

  worker.on('message', (result) => {
     const { html } = result;

    resolve(html);
  });
  worker.on('error', reject);
  worker.on('exit', (code) => {
    if (code !== 0) {
      reject(new Error(`Worker stopped with exit code ${code}`));
    }
  });

  worker.postMessage({
    ...
  });
});

I've just encountered another scenario for this same similar file based routing approach, this time for API endpoints instead of pages, that just export a handler function and return a Response object. (think for serverless and edge functions). Again for local development, I would like to be able to edit my code and see changes when live reloading. However, for this a Worker seemed like overkill and this time, I was able to get the simpler cache busting technique to work. (hurrah!)

let href = new URL(apiRoute, `file://${apisDir}`).href; // apiRoute -> /api/greeting.js

if (isDevMode) {
  href = `${resolvedUrl}?t=${Date.now()}`;
}

const { handler } = await import(href);
const req = new Request(new URL(`https://localhost:1984${apiRoute}`));
const res = await handler(req);

// ...

So mostly all to say:

  1. Thanks for everyone who has helped provide work arounds, I'm very grateful for it! 😊
  2. Would love to see a way to do this in NodeJS / ESM itself 🤞

Thanks everyone and appreciate all the hard work! ✌️

dan-mk commented 1 year ago

My current workaround is the following: create a temporary copy of the file with a hashed name, import it and delete it. It's a dirty solution, but it worked for me and maybe it will help somebody else.

export async function importFresh(modulePath) {
  const filepath = path.resolve(modulePath);
  const fileContent = await fs.promises.readFile(filepath, "utf8");
  const ext = path.extname(filepath);
  const extRegex = new RegExp(`\\${ext}$`);
  const newFilepath = `${filepath.replace(extRegex, "")}${Date.now()}${ext}`;

  await fs.promises.writeFile(newFilepath, fileContent);
  const module = await import(newFilepath);
  fs.unlink(newFilepath, () => {});

  return module;
}

EDIT: I was using the method .trimEnd(".ts") before, but trimEnd doesn't even accept arguments. As a result, it was generating files named as example.ts123456789.ts. It was still working, but I've fixed that and also changed the code to accept any extension, so that .js files will also work.

gamersindo1223 commented 12 months ago

Is there any Estimated Time when this feature will be released?

GeoffreyBooth commented 12 months ago

Is there any Estimated Time when this feature will be released?

Pull requests are welcome!

laverdet commented 11 months ago

You can evict modules by exposing internals:

import { createRequire } from "node:module";
// `log.mjs` just contains `console.log("wow");`
import "./log.mjs";

const evictModule = function() {
    try {
      const require = createRequire(import.meta.url);
      const loader = require("internal/process/esm_loader");
      const { loadCache } = loader.esmLoader;
      if (loadCache) {
        return url => {
          if (loadCache.has(url)) {
            loadCache.delete(url);
            return true;
          } else {
            return false;
          }
        };
      }
    } catch {}
}();

console.log(evictModule?.(import.meta.resolve("./log.mjs")));
await import("./log.mjs");
$ node -v      
v20.8.0

$ node --expose-internals main.mjs
wow
true
wow

node does keep references stored in another map that's more deeply buried so the evicted modules don't actually get garbage collected. But if your intention is to re-import then it's worth a shot.

On the topic of ESM hot reloading I made a pretty sophisticated HMR --loader here: https://github.com/braidnetworks/dynohot

It's loosely compatible with esm-hmr, Vite, and Webpack APIs. There are probably differences in execution order because we're supporting top-level await and promise-returning accept, dispose, & prune handlers. My team has been using it for a couple of months now with really good results.

halfmatthalfcat commented 11 months ago

You can evict modules by exposing internals:

import { createRequire } from "node:module";
// `log.mjs` just contains `console.log("wow");`
import "./log.mjs";

const evictModule = function() {
    const require = createRequire(import.meta.url);
    const loader = require("internal/process/esm_loader");
    const { loadCache } = loader.esmLoader;
    if (loadCache) {
      return url => {
        if (loadCache.has(url)) {
          loadCache.delete(url);
          return true;
        } else {
          return false;
        }
      };
    }
}();

console.log(evictModule?.(import.meta.resolve("./log.mjs")));
await import("./log.mjs");
$ node -v      
v20.8.0

$ node --expose-internals main.mjs
wow
true
wow

node does keep references stored in another map that's more deeply buried so the evicted modules don't actually get garbage collected. But if your intention is to re-import then it's worth a shot.

On the topic of ESM hot reloading I made a pretty sophisticated HMR --loader here: https://github.com/braidnetworks/dynohot

It's loosely compatible with esm-hmr, Vite, and Webpack APIs. There are probably differences in execution order because we're supporting top-level await and promise-returning accept, dispose, & prune handlers. My team has been using it for a couple of months now with really good results.

So is this essentially deleting a pointer to the real module loaded in V8? What's actually stored in that map?

laverdet commented 11 months ago

So is this essentially deleting a pointer to the real module loaded in V8? What's actually stored in that map?

That's not really how V8 works. You can delete a handle to a value, and then it's V8's job to garbage collect the "pointer" at some point when it feels the vibes are good. Regardless, this is all plain old JavaScript. The require in my example just pulls in this internal module: https://github.com/nodejs/node/blob/1dc0667aa6096f10c5f95471dfe27e78db1dafd5/lib/internal/process/esm_loader.js

We just so happened to luck out that they've internally exported the loader "cache" [I think cache is not the best name for this map because the contents of the record affect correctness and not performance]. It's been a while since I looked at this code and it changed recently [053511f7eca7cf50233abb10e7d88588aea6fc93]. And like I said this is not the only reference that nodejs holds onto the module, so V8 will not collect the garbage of a module deleted in this manner; once imported its in the heap forever. This is not a limitation of V8 it's just a consequence of nodejs's implementation, and is not a hard thing to change. We know it's possible to collect stale modules because vm and isolated-vm) do it.

Implementing this feature in a blessed way in nodejs is not difficult but doing so may be in direct violation of the specification:

If this operation is called multiple times with the same (referrer, specifier) pair and it performs FinishLoadingImportedModule(referrer, specifier, payload, result) where result is a normal completion, then it must perform FinishLoadingImportedModule(referrer, specifier, payload, result) with the same result each time.

That might be ok because nodejs has lots of power tools that alter the fabric of reality.

What I don't like about delete require.cache[key] and my sample above is that they provide the capability for any module to remove any other module. That was ok during the complete anarchy of CommonJS but ESM should probably hold itself to a higher standard. I haven't thought about this much at all, and this is an idea off the top of my head, but someone might consider implementing something like import.meta.releaseSelf() which would allow a module to release itself from the module graph. In the case of an error during dynamic import you could define the release function as a property on the error object itself [and all consumers of an errored dynamic module would need to release]. That would isolate the powerful capability to known sites. Or maybe it belongs as an import attribute i.e. import('./maybe.mjs', { with: { weak: true } }), and is only allowed on dynamic imports.

GeoffreyBooth commented 11 months ago

The loadCache you’re referring to exists to be spec compliant to the part of the spec you quoted. It’s also more efficient to load a module from memory rather than from disk on subsequent times that it’s imported, but the spec compliance was the primary motivator. This cache is separate from the modules loaded into V8; those are contained within V8 in its own memory.

It’s been a long time since I looked into this, but my understanding from way back when was that there is no way to delete or replace an ES module once it’s been loaded into V8. There exist some debugging protocol methods to do things along those lines but they haven’t been extended to ESM (at least as of a few years ago; it would be great if I’m wrong about this now). Hence the existing methods of hot module reload that involve wrappers around modules and using query strings, that get the job done (see Vite for a great example) but slowly use more and more memory the longer your dev server runs because the replaced older versions of each module never get deleted from memory.

If you can find a way to purge old ES modules from V8, that would be a wonderful discovery. I don’t think we would be concerned with making such an API available to users; the entire module customization hooks API is available to users, and it allows whatever spec violations the users want (it’s not like running CoffeeScript is spec compliant, but those are the types of use cases that the hooks enable). Node aims to be spec compliant by default, but not to block users from customizing it to behave in noncompliant ways if desired.

laverdet commented 11 months ago

It’s been a long time since I looked into this, but my understanding from way back when was that there is no way to delete or replace an ES module once it’s been loaded into V8.

I have a decent bit of experience here since I was very much in the weeds on this while working on isolated-vm. v8 doesn't treat modules much differently than plain Object allocations. They are HeapObject instances which can be garbage collected like anything else. Otherwise Chrome would need to continually allocate new isolates on the same origin as you browse different pages on a website. I'd love to see evidence to the contrary but Eternal module handles would be antithetical to the v8 design philosophy.

nodejs demonstrates this collectability, today, with the vm module:

import { memoryUsage } from "node:process";
import { SourceTextModule } from "node:vm";

for (let ii = 0; ; ++ii) {
    const module = new SourceTextModule(`
        // allocate and materialize 1mb uint8 array
        export const uint8 = new Uint8Array(1024 * 1024);
        for (let ii = 0; ii < uint8.length; ii += 4096) {
            uint8[ii] = 1;
        }`, { identifier: "file:///module0" });
    await module.link(() => {});
    await module.evaluate();
    // quickly stabilizes around 40mb. other statistics are stable as well.
    console.log(ii, memoryUsage().heapTotal >> 20);
}
node --experimental-vm-modules test.mjs 
0 3
1 3
// [...]
122570 37
122571 37
// [...]
184255 38
184256 38

Actually, and this is surprising, my evictModule example does garbage collect the evicted modules. This wasn't true the last time I looked into it (v20.1.0) so recent changes to nodejs have removed the deeper reference. Sample code:

chunk.mjs:

// allocate and materialize 1mb uint8 array
export const uint8 = new Uint8Array(1024 * 1024);
for (let ii = 0; ii < uint8.length; ii += 4096) {
    uint8[ii] = 1;
}

test.mjs

import { createRequire } from "node:module";

const evictModule = function() {
    try {
        const require = createRequire(import.meta.url);
        const loader = require("internal/process/esm_loader");
        const { loadCache } = loader.esmLoader;
        if (loadCache) {
            return url => {
                if (loadCache.has(url)) {
                    loadCache.delete(url);
                    return true;
                } else {
                    return false;
                }
            };
        }
    } catch {}
}();

const register = new FinalizationRegistry(name => {
    console.log("collected", name);
});
for (let ii = 0; ; ++ii) {
    const { uint8 } = await import("./chunk.mjs");
    register.register(uint8, ii);
    evictModule?.(import.meta.resolve("./chunk.mjs"));
}

Results:

-> % node --expose-internals test.mjs
collected 63
collected 62
collected 61
collected 60
// [and so on]

So the question of whether or not this is possible has an answer: it is possible.


I don’t think we would be concerned with making such an API available to users; the entire module customization hooks API is available to users

The only way to implement this functionality as a loader would be to create the entire module graph within vm. This is a non-starter because it is not currently possible to invoke the loader chain programmatically.

Anyway, I would clearly be interested in having a feature like this, but I also can't come up with something that's safe. I think an import attribute sounds interesting, but if we had an attribute which skips the loader cache then you'd run into weird situations where if a module self-imports itself it would actually get a namespace object belonging to a different instance of the same module.

devsnek commented 11 months ago

@laverdet i believe if you make another module which statically imports your generated modules (even without importing anything from them, just import "foo" will do), it will never release them even though nothing directly references any of their resources. if that's not the case anymore, its possible there's a path forward here.

laverdet commented 11 months ago

@devsnek I'm sorry, it's just not true. v8 does not leak module handles, full stop. I have a great deal of respect for the nodejs team and everything they've done for the community but this condition does not exist in v8. Dynamic loading and unloading of code was just previously not a design goal of nodejs, and that is ok.

My vm example from above used to leak, but the leak was was node's fault. It was actually just fixed in v20.8.0 by @joyeecheung -- see: https://github.com/nodejs/node/commit/b0ce78a75b & https://github.com/nodejs/node/commit/4e578f8ab1

As for whether or not this was ever a condition in v8 I can go back to 2018 [v8 6.8.275] under Docker and isolated-vm. In this example a single isolate continually compiles, links, and evaluates a graph of 3 modules and stays under 2mb heap size (the array buffers are externally allocated).

Dockerfile:

FROM node:10
RUN npm install isolated-vm@4.3.0
COPY isolated.mjs .
ENTRYPOINT node --experimental-modules isolated.mjs

isolated.mjs

import ivm from "isolated-vm";

const isolate = new ivm.Isolate({ memoryLimit: 128 });
console.log("running", process.arch, process.versions);

for (let ii = 0; ; ++ii) {
    const main = isolate.compileModuleSync(
        `import { uint16 } from "chunk16";
    import { random, uint8 } from "chunk8";
    respond.applySync(undefined, [ random, uint8.length, uint16.length ]);`);

    const module8 = isolate.compileModuleSync(
        `import { uint16 } from "chunk16";
        export { uint16 };
        // verify that a new module is being run each time
        export const random = ${Math.random()};
        // allocate and materialize 1mb uint8 array
        export const uint8 = new Uint8Array(1024 * 1024);
        for (let ii = 0; ii < uint8.length; ii += 4096) {
            uint8[ii] = 1;
        }`);

    const module16 = isolate.compileModuleSync(
        `import { uint8 } from "chunk8";
        export { uint8 };
        // allocate and materialize 2mb uint16 array
        export const uint16 = new Uint16Array(1024 * 1024);
        for (let ii = 0; ii < uint16.length; ii += 4096) {
            uint16[ii] = 1;
        }`);

    const context = isolate.createContextSync();
    context.global.setSync("respond", new ivm.Reference((...args) =>
        console.log("observed", ...args)));
    main.instantiateSync(context, specifier => {
        switch (specifier) {
            case "chunk8": return module8;
            case "chunk16": return module16;
            default: throw new Error();
        }
    });
    main.evaluateSync(context);

    // These functions simply release the underlying `Persistent<T>` v8 handles. They're not an exotic
    // hack.
    context.release();
    main.release();
    module8.release();
    module16.release();
    console.log(ii, `${isolate.getHeapStatisticsSync().used_heap_size >> 20}mb`);
}
$ docker build --platform amd64 -t modules-test .
// [...]
$ docker run -t modules-test
(node:8) ExperimentalWarning: The ESM module loader is experimental.
running x64 { http_parser: '2.9.4',
  node: '10.24.1',
  v8: '6.8.275.32-node.59',
  uv: '1.34.2',
  zlib: '1.2.11',
  brotli: '1.0.7',
  ares: '1.15.0',
  modules: '64',
  nghttp2: '1.41.0',
  napi: '7',
  openssl: '1.1.1k',
  icu: '64.2',
  unicode: '12.1',
  cldr: '35.1',
  tz: '2019c' }
observed 0.6598370276400909 1048576 1048576
0 '1mb'
observed 0.43183456388024477 1048576 1048576
1 '1mb'
observed 0.012569878657472833 1048576 1048576
2 '1mb'
// [...]
observed 0.6746212970768821 1048576 1048576
44808 '1mb'
observed 0.5481371445716845 1048576 1048576
44809 '1mb'
devsnek commented 11 months ago

Sorry I should have been clearer. If you want to collect the entire graph it works fine. But the point of hot reloading is generally that you only want to replace specific modules within the graph.

laverdet commented 11 months ago

Sorry I should have been clearer. If you want to collect the entire graph it works fine. But the point of hot reloading is generally that you only want to replace specific modules within the graph.

This is what dynohot does actually. Imports are rewritten to point to a single "module controller" [import { symbol } from "./ref"; becomes import controller from "hot:module?specifier=./ref";]. The controller maintains handles to simulated module instances and prunes them as they go stale. Module bodies are rewritten to a generator function so that they can be reevaluated over and over without leaking data. Without using --expose-internals only the module code is leaked (not module resources), once per source version (not per evaluation). If you expose internals you can get away without leaking anything at all.

chunk.mjs

export const uint8 = new Uint8Array(1024 * 1024);
export const random = Math.random();
for (let ii = 0; ii < uint8.length; ii += 4096) {
    uint8[ii] = 1;
}
setTimeout(() => import.meta.hot.invalidate(), 1);

main.mjs

import { random, uint8 } from "./chunk.mjs";
import { memoryUsage } from "node:process";

process.stdin.resume(); // stay alive
let ii = 0;
import.meta.hot.accept("./chunk.mjs", () => {
    console.log(++ii, random, uint8.length, memoryUsage());
});
-> % node --loader dynohot main.mjs
(node:5400) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
1 0.7925818173943733 1048576 {
  rss: 101040128,
  heapTotal: 7290880,
  heapUsed: 5859712,
  external: 2777815,
  arrayBuffers: 2108070
}
[hot] Loaded 0 new modules, reevaluated 1 existing module in 3ms.
2 0.2771700818620366 1048576 {
  rss: 102531072,
  heapTotal: 10174464,
  heapUsed: 5455080,
  external: 2773955,
  arrayBuffers: 2107670
}
[hot] Loaded 0 new modules, reevaluated 1 existing module in 2ms.
// [...]
[hot] Loaded 0 new modules, reevaluated 1 existing module in 1ms.
689 0.6166603291700468 1048576 {
  rss: 247431168,
  heapTotal: 8077312,
  heapUsed: 5929784,
  external: 70927643,
  arrayBuffers: 70265058
}
[hot] Loaded 0 new modules, reevaluated 1 existing module in 1ms.
690 0.15742998982874745 1048576 {
  rss: 247087104,
  heapTotal: 8077312,
  heapUsed: 5005952,
  external: 3818779,
  arrayBuffers: 3156194
}
// gc just ran, memory back to baseline
aral commented 11 months ago

Hey all,

Given the astounding amount of effort that folks are putting into realising this, would it be worthwhile, perhaps, to revaluate implementing this as a core Node.js feature?

Pave the cow paths and all that… :)

All the best, Aral

jackmoxley commented 10 months ago

Better to expose it with a warning, than not to expose it and risk breaking lots of npm modules reliant on hacks, and potentially creating vulnerabilities.

GeoffreyBooth commented 10 months ago

If you expose internals you can get away without leaking anything at all.

@laverdet what do you need from Node internals to avoid memory leaks?

laverdet commented 10 months ago

@GeoffreyBooth the hack I'm using, and the feature most people are asking for here (a replacement for delete require.cache[key]), is pretty unsafe. I'm not sure I'd even want to see it in core nodejs. The yield preamble in dynohot (described here https://github.com/braidnetworks/dynohot#transformation) means we only have to leak module source code once per file save, and this is only during development. I haven't run into any issues with the leak even running a server for multiple days while developing. My server always needs to be restarted for some other reason besides memory. I had some more thoughts in the last paragraph here: https://github.com/nodejs/node/issues/49442#issuecomment-1740995593

The closest thing to a long-term solution I've come up with is an import attribute which can only be used on dynamic modules: something like import('./maybe.mjs', { with: { weak: true } }). This would maintain its own copy of loadCache so that when the reference to the module is lost the memory is reclaimed. When you get into the implementation details, though, this approach raises many more questions. I think a lot more academic thinking needs to go into this, see: the compartments proposal.

It would also be worth considering a version of my evictModule hack exported from node:module with a hostile name of something like unsafe_unstable_evictModuleByResolvedURL. Such a function would be perpetually marked as 1 - Experimental in the documentation and should probably unconditionally print a warning to the console. I think experimental power tools like this can be a good stepping stone until Mark Miller completes his life's work and gives us a divine answer to these questions.

👉🏻 In the mean time, if the nodejs team is open to it, I'm happy to submit a PR with the unstable function I described here.

I'm sure a lot of lawless module authors would be eager to unleash a new era of footguns on the community.

GeoffreyBooth commented 10 months ago

Well, please don’t sell it too hard 😄

We don’t want perpetually experimental things. But I don’t know why “evict” would need to be; is it expected to have unavoidable breaking changes frequently? Or were you suggesting the experimental status just because it’s strongly discouraged?

We have a few flags that are already for particular use cases and are strongly discouraged in general, especially for production; --expose-internals is probably the most prominent of these. We could add another, like --allow-unsafe-module-replacement or whatever, but I think it would only be worth doing so if the flag provided a meaningfully better UX/DX than what is currently possible. Looking at https://github.com/braidnetworks/dynohot/blob/fb822d2022f9d71d9d7ab5377d5b5d55ddcb26a8/runtime/utility.ts#L62-L83, though, I don’t think this is avoiding the memory leak; loadCache is used for spec compliance when two import() expressions have the same resolution and the underlying resource has been deleted between loads, and per spec it needs to continue to load successfully every time. There’s another cache within V8 for all the ES modules that have ever been loaded and evaluated, and as far as I know there’s still no API for removing or updating those. That’s the API we’ve been waiting for for years, that would truly solve this issue. (And if it’s finally been added, please let us know and provide a link; or feel free to work with the V8 team and submit a PR.)

laverdet commented 10 months ago

There’s another cache within V8 for all the ES modules that have ever been loaded and evaluated, and as far as I know there’s still no API for removing or updating those.

This is a common misconception but it is not true and I don't believe it has ever been true. I've refuted it using isolated-vm here https://github.com/nodejs/node/issues/49442#issuecomment-1741839325 and here https://github.com/nodejs/loaders/issues/157#issuecomment-1687044349. Module records in v8 are no more special than a plain object. Since the fix in nodejs v20.8.0 you can also prove this using SourceTextModule of node:vm.

Or were you suggesting the experimental status just because it’s strongly discouraged?

Yes, I would discourage its use. require.register and require.cache contributed to the total anarchy we've seen under CommonJS that the ecosystem is still feeling to this day. The evict function explicitly breaks the invariants promised under the specification. I'm clearly ok with reality-altering features (see fibers) but I do want to make sure I communicate the nature of what I've proposed.

GeoffreyBooth commented 10 months ago

This is a common misconception but it is not true and I don’t believe it has ever been true.

Then are you using such an API, and if not, why not? Wouldn’t replacing modules loaded into V8 solve memory leaks?

laverdet commented 10 months ago

Then are you using such an API, and if not, why not? Wouldn’t replacing modules loaded into V8 solve memory leaks?

I'm not sure I understand. There is no v8 API for this. Simply, once there are no more handles to a module it is garbage collected, in the same way a JSON object or any other value would be garbage collected. Since the module is in node's loadCache forever it can never be collected. There's no replacing a module, in the same way you can't modify the source code of a function after it's been created.

mcollina commented 10 months ago

We have an implementation of the "hack" (with the leak) in https://github.com/platformatic/platformatic/blob/main/packages/runtime/lib/loader.mjs. Works great.

Note that we don't do full HMR. Basically we reload the app, and we have special code to not close the HTTP server and live restart a Fastify application. This is mostly side effect free.

Note that quite a lot of people are working towards a world where this is possible. My NodeContext PR stalled because of memory leaks in instantiating Node.js core objects, but the team is busy fixing those.

GeoffreyBooth commented 10 months ago

If the leak is only caused by loadCache / “module map”, then I think it should be fine to have an API like import { clearCache } from 'node:module' that allows you to delete a particular entry (clearCache(moduleAbsoluteURL) or even all entries; that’s just like clearing the cache in a browser. I think the only consequence would be that a subsequent import might produce a different result rather than getting the previously cached result, but that’s also similar to a browser cache having been cleared. I know the spec says that subsequent imports must return the same result, but I assume that that means “in general usage,” not if the user has specifically instructed the runtime to not return a cached result.

laverdet commented 10 months ago

that’s just like clearing the cache in a browser

It's not a good comparison, since the browser cache doesn't affect correctness. I mentioned that here, that I think "cache" is a poor choice of name for this since it is a matter of correctness and not performance.

I think the only consequence would be that a subsequent import might produce a different result rather than getting the previously cached result, but that’s also similar to a browser cache having been cleared.

Clearing the browser cache of a running web page will not affect the result of import(). It has no observable effect, except on timing.

I know the spec says that subsequent imports must return the same result, but I assume that that means “in general usage,” not if the user has specifically instructed the runtime to not return a cached result.

The language in the specification is clear, and our proposed function is a violation of the invariants expected of the host. As a matter of personal style I am totally ok violating any rule for any reason, but I do want to make sure we're on the same page that this is a dangerous and powerful violation of the specification.

GeoffreyBooth commented 10 months ago

I do want to make sure we're on the same page that this is a dangerous and powerful violation of the specification.

I mean, the while point of the module customization hooks is to allow the user to violate spec however they please (that Node can achieve). It's not like importing CoffeeScript is spec compliant. If we're worried about dependencies using this API to mischievous ends we could gate it behind a flag or create a permission for it; but I'm not sure what the risk is.

laverdet commented 10 months ago

It's not like importing CoffeeScript is spec compliant.

It is, though. CoffeeScript would be a Cyclic Module Record (a module which participates in the specified cyclic resolution algorithm). Cyclic Module Record is described in the specification as "abstract".

Source Text Module Record, which is the ES Modules that we all know, is a concrete implementation of that abstract interface. So a CoffeeScript module would be a separate but valid implementation of a Cyclic Module Record.

If you go up one level there is a plain Module Record, from which Cyclic Module Record is implemented. This provides the means for modules which don't participate in the cyclic resolution algorithm (wasm, json, or the whole node: scheme). Neither CoffeeScript, WASM, or TypeScript are specified in ES262 but they are still spec compliant.

Anyway, you are right that the loaders API provides the means to break specification, since the result of resolve is neither pure nor is it memoized. This breaks the invariants of HostLoadImportedModule: "If this operation is called multiple times with the same (referrer, specifier) pair [then it must resolve] with the same result each time". I suppose the difference here is that we are proposing a globally-importable utility which could be used outside of a loader.

but I'm not sure what the risk is

The risks are impossible to enumerate since we're reneging on an presumed invariant. Like what happens if a user attempts to evict node:fs? Idk, will it segfault?

Anyway the risks are probably mostly hypothetical in nature. I'll try and open a PR soon to continue the discussion.

laverdet commented 8 months ago

I'm abandoning the PR at #50618. After more reflection I think this is going to harm the ecosystem more than it will do any good. From what I can tell there are 3-4 different use-cases mentioned in this issue which can be solved in other ways:

Q: "How do I retry a module whose file content was created after a failed import" A: Just reimport it with a cache bust: await import("failed-module?retry=1")

Q: "How do I reload a module and all its dependencies" A: Use a loader which carries forward the cache bust: https://github.com/nodejs/node/pull/50618#issuecomment-1894603753

Q: "How do I add module support to unit test frameworks" A: Use vm.SourceTextModule

Q: "How do I hot reload modules during development" A: Use dynohot

pygy commented 8 months ago

@laverdet Does this look fine to you (this is a reworked version of https://github.com/nodejs/node/pull/50618#issuecomment-1894603753)?

https://github.com/pygy/esm-reload

laverdet commented 8 months ago

@pygy The loader code looks correct and does what it says it does.

import * as mDev from './my-module.js?dev'
process.env.NODE_ENV='production'
import * as mProd from './my-module.js?prod'

This counter-example from the documentation doesn't make sense though. Imports are not executed in an imperative manner, so the body of ./my-module.js?prod will actually run before process.env.NODE_ENV='production' is evaluated. In fact you can't even guarantee that ./my-module.js?dev will run before ./my-module.js?prod without also looking at the rest of the module graph.

pygy commented 8 months ago

Good catch, thanks this should have been dynamic imports.

Fixed, and I added an example with dependencies for extra clarity:


With dependencies

Suppose these files:

// foo.js
export {x} from "./bar.js"

// bar.js
export const x = {}

We can then do

import "esm-reload"

const foo1 = await import("./foo.js?instance=1")
const bar1 = await import("./bar.js?instance=1")

const foo2 = await import("./foo.js?instance=2")
const bar2 = await import("./bar.js?instance=2")

assert.equal(foo1.x, bar1.x)
assert.equal(foo2.x, bar1.x)

assert.notEqual(bar1.x, bar2.x)

Edit again: https://www.npmjs.com/package/esm-reload

github-actions[bot] commented 2 months ago

There has been no activity on this feature request for 5 months. To help maintain relevant open issues, please add the https://github.com/nodejs/node/labels/never-stale label or close this issue if it should be closed. If not, the issue will be automatically closed 6 months after the last non-automated comment. For more information on how the project manages feature requests, please consult the feature request management document.

simlu commented 2 months ago

Should probably close this as "wont do"?

lzxb commented 1 month ago

My example

import fs from 'node:fs';
import { isBuiltin } from 'node:module';
import path from 'node:path';
import { fileURLToPath } from 'node:url';
import {
    createContext,
    type Module,
    type ModuleLinker,
    SourceTextModule,
    SyntheticModule
} from 'node:vm';

const ROOT_MODULE = '__root_module__';

const link: ModuleLinker = async (specifier: string, referrer: Module) => {
    // Node.js native module
    const isNative = isBuiltin(specifier);
    // node_modules
    const isNodeModules =
        !isNative && !specifier.startsWith('./') && !specifier.startsWith('/');
    if (isNative || isNodeModules) {
        const nodeModule = await import(specifier);
        const keys = Object.keys(nodeModule);
        const module = new SyntheticModule(
            keys,
            function () {
                keys.forEach((key) => {
                    this.setExport(key, nodeModule[key]);
                });
            },
            {
                identifier: specifier,
                context: referrer.context
            }
        );
        await module.link(link);
        await module.evaluate();
        return module;
    } else {
        const dir =
            referrer.identifier === ROOT_MODULE
                ? import.meta.dirname
                : path.dirname(referrer.identifier);
        const filename = path.resolve(dir, specifier);
        const text = fs.readFileSync(filename, 'utf-8');
        const module = new SourceTextModule(text, {
            initializeImportMeta,
            identifier: specifier,
            context: referrer.context,
            // @ts-expect-error
            importModuleDynamically: link
        });
        await module.link(link);
        await module.evaluate();

        return module;
    }
};

export async function importEsm(identifier: string): Promise<any> {
    const context = createContext({
        console,
        process,
        [ROOT_MODULE]: {}
    });
    const module = new SourceTextModule(
        `import * as root from '${identifier}';
        ${ROOT_MODULE} = root;`,
        {
            identifier: ROOT_MODULE,
            context
        }
    );
    await module.link(link);
    await module.evaluate();
    return context[ROOT_MODULE];
}

function initializeImportMeta(meta: ImportMeta, module: SourceTextModule) {
    meta.filename = import.meta.resolve(module.identifier, import.meta.url);
    meta.dirname = path.dirname(meta.filename);
    meta.resolve = import.meta.resolve;
    meta.url = fileURLToPath(meta.filename);
}

Use it

const module = await importEsm('filename');
RomainLanz commented 1 month ago

If you are interested, we created Hot Hook to hot reload node imports during development.

https://adonisjs.com/blog/hmr-in-adonisjs https://docs.adonisjs.com/guides/concepts/hot-module-replacement https://github.com/julien-R44/hot-hook