nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.66k stars 29.63k forks source link

Invalidate cache when using import #49442

Open jonathantneal opened 5 years ago

jonathantneal commented 5 years ago

How do I invalidate the cache of import?

I have a function that installs missing modules when an import fails, but the import statement seems to preserve the failure while the script is still running.

import('some-module').catch(
  // this catch will only be reached the first time the script is run because resolveMissingModule will successfully install the module
  () => resolveMissingModule('some-module').then(
    // again, this will only be reached once, but it will fail, because the import seems to have cached the previous failure
    () => import('some-module')
  )
)

The only information I found in regards to import caching was this documentation, which does not tell me where the “separate cache” used by import can be found.

No require.cache

require.cache is not used by import. It has a separate cache. — https://nodejs.org/api/esm.html#esm_no_require_cache

devsnek commented 5 years ago

the import cache is purposely unexposed. adding a query has been the generally accepted ecosystem practice to re-import something.

however, a failure to import something will not fill the cache.

this trivial program works fine for me (assuming nope.mjs does not exist):

import fs from 'fs';

import('./nope.mjs')
  .catch(() => fs.writeFileSync('./nope.mjs'))
  .then(() => import('./nope.mjs'))
  .then(console.log);
jonathantneal commented 5 years ago

@devsnek, hmm, might this be limited to imports that use node_modules? This similarly trivial program fails for me the first time, but not the second.

import child_process from 'child_process';

import('color-names')
  .catch(() => child_process.execSync('npm install --no-save color-names'))
  .then(() => import('color-names'))
  .then(console.log);
bmeck commented 5 years ago

Note that the JS spec requires imports to be deterministic/idempotent on a source text. Exposure of a cache would not allow you to fix the code above.

On Fri, Apr 5, 2019, 12:01 PM Gus Caplan notifications@github.com wrote:

the import cache is purposely unexposed. adding a query has been the generally accepted ecosystem practice to re-import something.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nodejs/node/issues/49442, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOUo5N5tioTq2s_t431OrihH_wH8Qagks5vd4FSgaJpZM4cfV4g .

devsnek commented 5 years ago

if its just happening with node_modules it could be https://github.com/nodejs/node/issues/26926

MylesBorins commented 5 years ago

can this be closd?

jkrems commented 5 years ago

I think a use case like this would hopefully be implemented as a loader. Do we already track this as a use case in that context?

bmeck commented 5 years ago

@jkrems we have old documents with that as a feature, but no success criteria examples.

giltayar commented 4 years ago

FYI, I'm implementing ESM support in Mocha (https://github.com/mochajs/mocha/pull/4038), and cannot currently implement "watch mode", whereby Mocha watches the test files, and reruns them when they change. So "watch mode" in Mocha, in the first iteration, will probably not support ESM, which is a bummer.

While we could use cache busting query parameters, that would mean that we are always increasing memory usage, and old and never-to-be-used versions of the file will continue staying in memory due to the cache holding on to them.

And I'm not sure a loader would help here, as the loader also has no access to the cache.

guybedford commented 4 years ago

An API for unloading modules certainly makes sense.

Usually with a direct registry API there is the tracing issue. An API that handles dependency removal can be useful.

A simple API might be something like -

import { unload } from ‘module’;

unload(import.meta.url); // returns true

Where the unload function would remove that module including all its dependencies from the registry. If in a cycle the whole cycle would be removed.

A subsequent module load would refresh all the loads anew.

Other problems to ensure work out is what if modules in the tree are still in-progress. I’d be tempted to say it should fail for that case and only work when all modules have either errored or completed.

We still have memory leak concerns as v8 doesn’t lend itself easily to module GC still. But Node.js can lead the way here as it should. It will be an ongoing process to get there, but the API can come first.

The main questions then seem to be:

On Tue, Nov 26, 2019 at 00:09 Gil Tayar notifications@github.com wrote:

FYI, I'm implementing ESM support in Mocha (mochajs/mocha#4038 https://github.com/mochajs/mocha/pull/4038), and cannot currently implement "watch mode", whereby Mocha watches the test files, and reruns them when they change. So "watch mode" in Mocha, in the first iteration, will probably not support ESM, which is a bummer.

While we could use cache busting query parameters, that would mean that we are always increasing memory usage, and old and never-to-be-used versions of the file will continue staying in memory due to the cache holding on to them.

And I'm not sure a loader would help here, as the loader also has no access to the cache.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nodejs/node/issues/49442?email_source=notifications&email_token=AAESFSTBSWVLXPTWDYZOHSDQVSVQRA5CNFSM4HD5LYQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFEXNDI#issuecomment-558462605, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESFSWTJXGB4J6WWNMOLRDQVSVQRANCNFSM4HD5LYQA .

devsnek commented 4 years ago

I'm really not a fan of the idea of our module cache being anything except insert-only. CJS cache modification is bad already, and CJS modules don't even form graphs.

Additionally, other runtimes (like browsers) will never expose this functionality, so some alternative system will have to be used for them regardless of what node does, in which case it seems like that system could just be used for node.

bmeck commented 4 years ago

@giltayar have you looked into using Workers or other solutions to have a module cache that you can destroy (such as by killing the Worker)?

giltayar commented 4 years ago

@bmeck - interesting. That would mean that the tests themselves run in Workers. While I am theoretically familiar with workers, I haven't yet had any experience with them: is any code that runs in the main process compatible with worker inside a worker? In other words, compatibility-wise, would all test code that works today in the "main process" work inside workers?

I wouldn't want Mocha to have a version (even a semver-major breaking one) where developers will need to tweak their code because now it's running inside a worker. I'm guessing that there's a vast amount of that code running inside Mocha, and any incompatibility would be a deal breaker.

devsnek commented 4 years ago

there are differences between workers and the main thread, mostly surrounding the functions on process, like process.exit() in a worker doesn't end the process, just the thread. There's a good list here: https://nodejs.org/api/worker_threads.html#worker_threads_class_worker

giltayar commented 4 years ago

Looking at the list, I can see process.chdir() is not available, which is probably a deal breaker in many tests (unit tests probably don't use process.chdir(), but Mocha is used for all sorts of tests), as is breaking some native add-ons (although I'm not sure how big of a problem this is in the real world).

I would hesitate to say this, as my only contribution to Mocha currently is this pull request, but I would guess that the owners would veto this. Or maybe allow this only if we add a --run-in-workers option. In any case, without looking too much at the code, this is probably a significant investment to implement for supporting ES Modules, as this is not a simple refactor, but rather an architectural change in how Mocha works.

giltayar commented 4 years ago

If it wasn't apparent from the above, I believe I would still prefer a "module unloading" API, unless the working group is adamant and official about not having one, of course. Which would probably mean going the "subprocess"/"worker" route.

devsnek commented 4 years ago

i admittedly don't know much about mocha... is using a separate process not doable either?

giltayar commented 4 years ago

I'll go back to the Mocha contributors team with this.

boneskull commented 4 years ago

Hi, I work on Mocha!! I am trying to see how we can move @giltayar's PR forward.

There are actually two situations in which "module unloading" is needed in Mocha:

  1. In "watch" mode with CJS scripts, when Mocha detects a file must be reloaded, it is deleted from require.cache and re-required, then tests are re-run. Mocha is not the only tool that does this sort of cache-busting.
  2. When developers are writing tests with Mocha (and many other test frameworks), they may want to use module-level mocking--they essentially replace one module with another phony one (I'm going to tag @theKashey here because he knows more about this ). Or even pretend like a module does not exist at all. It is then very important to Mocha that users can consume these sort of mocking frameworks to write their test code.

In the first case, it's possible, though probably at a performance cost, Mocha could leverage workers to handle ESM. I don't know enough about workers to say whether this will provide a sufficient environment for the test cases, but it feels like a misuse of the workers feature. At minimum it seems like a lot of added complexity.

In the second case, I can't see how using workers would be feasible. Test authors need to be able to mock modules on-the-fly and reference them directly from test cases, using mocking frameworks.


I don't know why this sort of behavior was omitted from the official specification. If the reasons involve "browser security", well, it further reinforces that browsers are a hostile environment for testing. I do know that this behavior is a very real need for many, from library and tooling authors down to developers working on production code.

We do need an "unload module" API; until such a thing lands, tools will be limited, implementations will be difficult (if possible), and end users will be frustrated when their tests written in ESM don't work. I will also be frustrated, because those frustrated users will complain in Mocha's issue tracker!

I'm happy to talk in further detail about use-cases, but I'm eager to put an eventual API description in the more-capable hands of people like @guybedford.

@devsnek Given that enabling it also enables tooling, I'm curious why you feel locking this sort of thing down is a better direction?

cc @nodejs/tooling

P.S. I will be at the collab summit, and the tooling group will be hosting a collaboration session; maybe this can be a topic of discussion, or vice-versa if there's a modules group meeting...?

theKashey commented 4 years ago

Do you need unloading API for the watch mode? Yes, you need it to update the changed module code.

However, it is enough to handle watch mode? No, as long as the idea is to use changed module, as you have to find parents between you(a test) and changed module, and wipe them to perform a proper reinitialization.

So - an ability to invalidate a cache line is not enough, for the mocking task we also have to know the cache graph, we could traverse and understand which work should be done.

An API for unloading modules certainly makes sense.

In my opinion - this feature is something missing for a proper code splitting. There are already 100Mb bundles, separated into hundreds of pieces, you will never load simultaneously. But you if will - there is no way to unload them. Eventually, the Page or an Application would just crash.

giltayar commented 4 years ago

@boneskull - the second case you mentioned, I believe can and should be handled by module "loaders", which are a formal way to do "require hooks" for ESM. These will enable testing frameworks (like sinon and others) to manipulate how ES modules are loaded, and, for example, exchange other modules for theirs.

The spec and implementation for that are actively being discussed and worked on by the modules working group (see https://github.com/nodejs/modules/issues/351).

jonerer commented 4 years ago

I also need this. I'm making a template rendering engine. When generating the compiled template, I read from a custom format and output to a .js file (a standard ES Module). In order to use the file, I just import it. Upon file changes, I would like to re-write the file, clear the import cache and then re-import it.

devsnek commented 4 years ago

These all sound like use cases for V8's LiveEdit debug api (https://chromedevtools.github.io/devtools-protocol/v8/Debugger#method-setScriptSource). You can call into it using https://nodejs.org/api/inspector.html. cc @giltayar @boneskull

georges-gomes commented 4 years ago

+1 for unloading ES Modules. It's hard to make Hot Module Reload otherwise. Not for production but for development tools. And using a ?query=x doesn't seem to work on file node 13.11.0 at least. Thanks

georges-gomes commented 4 years ago

@devsnek Can you provide a little example or pseudo-code on usage of setScriptSource. I have been researching for an 1hour without progress. Thanks

georges-gomes commented 4 years ago

@devsnek ok I progressed, I will post my findings back

devsnek commented 4 years ago

@georges-gomes you can subscribe to the Debugger.scriptParsed event to track the script id, and then when you need to modify the script you can call Debugger.setScriptSource.

jonerer commented 4 years ago

@georges-gomes If you are successful, I would be very grateful if you could post a short description on how you could use setScriptSource to solve this problem. On a blog post or something.

georges-gomes commented 4 years ago

@lulzmachine here is a working prototype https://gist.github.com/georges-gomes/6dc743addb90d2e7c5739bba00cf95ea

Unfinished but working. I have seen a few unexpected issues but let see how far we can get with this. Thanks @devsnek 👍

georges-gomes commented 4 years ago

@devsnek I get segmentation fault if I start using import in the new loaded script. I'm not sure setScriptSource supports ES Modules

georges-gomes commented 4 years ago

The current issues I have:

bmeck commented 4 years ago

It seems v8 is fixing some bugs with Module and LiveEdit (setScriptSource) still : https://bugs.chromium.org/p/v8/issues/detail?id=10341&q=setScriptSource&can=2

bmeck commented 4 years ago

I'd also clarify, setScriptSource does not evaluate the outer most scope of a source text when it is applied. LiveEdit takes place by replacing frames that are entered after it is called.

georges-gomes commented 4 years ago

@bmeck that's probably why import is not happening.

TimDaub commented 3 years ago

the import cache is purposely unexposed.

Why?

this trivial program works fine for me (assuming nope.mjs does not exist):

Fair enough. For me, however the following poses a problem

const { writeFileSync } = require("fs");
const assert = require("assert");

(async () => {
  const filename = "abc.js";
  const num = 123
  const content = `module.exports = ${num}`

  writeFileSync(filename, content);
  assert((await import(filename)).default === num) // true

  const newNum = 456;
  const newContent = `module.exports = ${newNum}`;
  writeFileSync(filename, newContent);

  assert((await import(filename)).default === newContent) // false because of cache
})();

With require, it was easy to invalidate its cache. How would I implement the above with import?

ljharb commented 3 years ago

At the moment, i don’t believe you can.

WebReflection commented 3 years ago

maybe late ... but the only reason I have {"type": "commonjs"} in all my test/ folders is because of code coverage which is impossible to have it 100% without cache invalidation (polyfills, different versions of nodejs, different envs, etc.)

accordingly, while I think cache invalidation would be bad in production in general, having a way to hot-reload modules, hence invalidate these, has a proven, long history, of usefulness.

if node only could expose any way to, at least, invalidate relative imports, as opposite of well known modules, it'd be great.

node --allow-import-invalidate test.js
// test.js
import('../thing.js').then(module => {
  // do something with module
  import.invalidate('../thing.js');
  // change something in the env
  import('../thing.js').then(module => {
    // do something else with the new module
  });
});
TimDaub commented 3 years ago

accordingly, while I think cache invalidation would be bad in production in general, having a way to hot-reload modules, hence invalidate these, has a proven, long history, of usefulness.

This is exactly my problem too. I only need to have a fresh require invocation for each test.

maybe late ... but the only reason I have {"type": "commonjs"} in all my test/ folders is because of code coverage

What exactly are you referring to with {"type": "commonjs"}? Docs?

ljharb commented 3 years ago

@TimDaub it’s the default. It’s only needed if a parent package.json specifies type module (which does one thing: makes .js files be treated as ESM instead of CJS)

bmeck commented 3 years ago

There are issues with the constraints on ESM by the spec regarding invalidation is a large topic still at TC39. Snowpack is in talks with module reloading (not with cache invalidation) in this area. Slides were made from talks following a Realms call on the topic. For now even if we expose the cache, it likely won't do what you want with how ESM is specced.

WebReflection commented 3 years ago

I don't expect import.invalidate to ever land on the Web and I personally don't want that to ever happen, which is why I've empathized "node only". Cache invalidation is bad on CJS too imho, but it's handy for development reasons (and never for production, in my experience).

As node is used as coverage tool, including its c8 helper, having no way to improve ESM modules code coverage, if not by running the same test multiple times with different versions of node, something that won't likely sum up coverage within its exported data, seems a big limitation.

I personally develop, and publish, dual modules, which is why I can use the CJS version of my modules within the test folder and invalidate these whenever I need, if I need, but as we're moving forward, I'd like to stop being forced to publish dual modules because I can't code-cover their cross-env/browser/node behavior.

As summary: does this need to involve TC39, instead of being a technical decision made in node, for node only?

bmeck commented 3 years ago

@WebReflection with the mandates from https://tc39.es/ecma262/#sec-hostresolveimportedmodule and other host hooks, yes it does need TC39 to loosen those somehow or work around the issue

WebReflection commented 3 years ago

@bmeck but couldn't a special flag enforce ignoring this step?

Each time this operation is called with a specific referencingScriptOrModule, specifier pair as arguments it must return the same Module Record instance if it completes normally.

Something like this:

node --expose-dyamic-import-invalidation-at-your-own-risk-and-with-performance-issues

would work ... literally any way would work, as long as there's a work-around, otherwise dual modules it is to me, as that worked well to date.

bmeck commented 3 years ago

@WebReflection it would require altering the VM (V8) to allow this, V8 generally is fragile enough around modules (see long outstanding https://bugs.chromium.org/p/v8/issues/detail?id=10284 ). I don't think this would be simpler than import.meta.hot that was talked about and a simple signaling mechanism.

WebReflection commented 3 years ago

@bmeck well, if import.meta.hot solves this, I'll happily wait. It wasn't mentioned in this thread, and it's the first time I read about it. If there's any link around this topic, I'd love to read it and try to figure out if that solves the current limitation, thanks.

devsnek commented 3 years ago

afaict, everyone who wants this functionality actually wants HMR. Maybe it would be more productive to bug a V8 product manager about HMR than to bug node about breaking cache invariants we don't control.

WebReflection commented 3 years ago

@devsnek we're having a conversation and it's been productive to me, as I've learned about import.meta.hot which I didn't know. As I still think HMR should not land on the Web, I was hoping node could've done something to help having HMR in development mode, but if that's not the case, then this issue could, as well, be closed.

katywings commented 3 years ago

Anyone here tried to just rename the lib folder of your project to lib1, lib2, lib3, counting up, each time a file changes? This might be a workaround 😅🙈

GeoffreyBooth commented 3 years ago

@WebReflection I think the issue is deeper than Node (others can correct me). Even if Node invalidates its cache, V8 won’t let it replace the ES module that’s already been loaded in V8. At least, that’s how things stand at the moment with V8, as far as I know. There was hope that a DevTools protocol, Debugger.setScriptSource if I’m remembering correctly, would let us tell V8 to change the contents of a loaded module; but that turned out not to work out.

guybedford commented 3 years ago

Tbh a require('module').globalCache Map being exposed might not be the end of the world and surely doesn't need TC39. The hard part is not exposing the private module wrap interface and wanting to provide dependency graph metadata for clearing ancestors.

ljharb commented 3 years ago

Wouldn't a global module map be incompatible with import maps support, since each module potentially has a contextually scoped module map?