nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.59k stars 29.06k forks source link

Invalidate cache when using import #49442

Open jonathantneal opened 5 years ago

jonathantneal commented 5 years ago

How do I invalidate the cache of import?

I have a function that installs missing modules when an import fails, but the import statement seems to preserve the failure while the script is still running.

import('some-module').catch(
  // this catch will only be reached the first time the script is run because resolveMissingModule will successfully install the module
  () => resolveMissingModule('some-module').then(
    // again, this will only be reached once, but it will fail, because the import seems to have cached the previous failure
    () => import('some-module')
  )
)

The only information I found in regards to import caching was this documentation, which does not tell me where the “separate cache” used by import can be found.

No require.cache

require.cache is not used by import. It has a separate cache. — https://nodejs.org/api/esm.html#esm_no_require_cache

guybedford commented 3 years ago

The import map is just a specific resolver implementation, so no different to the existing hooks we already have.

bmeck commented 3 years ago

@guybedford currently it is unsafe to populate the same URL twice in V8 in the same context per the issue linked above, it segfaults usually

guybedford commented 3 years ago

Right, it sounds like a fix for that is a good first step then indeed.

bmeck commented 3 years ago

@guybedford per the issue above, it has an existing changeset that fixes it but it is no longer assigned and several attempts to bump it have been made in various locations. Even if we do change it, GC currently is completely disabled. Per Snowpack, the workflows being looked at are around signaling and manual alteration via import.meta.hot not around exposing the Map of URL=>instance, also URL=>instance isn't stable as asserts are and other things might be additional parts of the cache key. Exposing the cache key is likely unstable enough to not want to do so.

guybedford commented 3 years ago

around signaling and manual alteration via import.meta.hot

Will this support named exports mutations? How will export * be handled? Doesn't this just get into the same problems of dynamic modules?

Working with v8 to make the cache key stable is important, yes including the GC issues. Node.js and Deno definitely are the drivers of this work. If v8 fork / custom patches are needed then even that makes sense, as having control of the module system is important to a JS platform!

bmeck commented 3 years ago

Will this support named exports mutations? How will export * be handled? Doesn't this just get into the same problems of dynamic modules?

No, they completely reload with new absolute cache keys in their implementation. export * causes strong linkage that requires that whole subsection of the graph to reload if it reloads. Per https://github.com/snowpackjs/esm-hmr the idea is to manually replace locals for the boundary locations of the replacement. I had made some slides on this just to visualize their approach after that call with TC39, @JoviDeCroock likely could speak better on details than I.

guybedford commented 3 years ago

Two-pronged approach seems fine yes. But then full reload scenario exactly relies on what is being discussed here - from v8 bugs to the Node.js API per the last comments.

bmeck commented 3 years ago

@guybedford what prevents full reload from following the same workflow? If the entire graph is strongly connected it would still work I believe.

guybedford commented 3 years ago

@bmeck not sure I understand, do you mean always relying on the import.meta.hot changing local specifiers?

bmeck commented 3 years ago

@guybedford yes, and if the entire graph is reloaded (full reload) there isn't a need for changing local bindings.

guybedford commented 3 years ago

@bmeck right, but you don't want to refresh the entire graph is the point. I think we do need the ability to refresh subgraphs as having the options being full reload only (as in, restarting Node.js) or local bindings only seems arbitrarily restricting.

developit commented 3 years ago

I scanned and haven't seen this mentioned anywhere yet - given that the cache can be bypassed by passing a querystring parameter:

let c = 0;
function importFresh(mod) {
  return import(`${mod}?v=${++c}`);
}

... couldn't the issue here be reframed to "it is not possible to update a module's exports after it has been imported"?

A workaround like this is what I've been noodling on:

// convert exports to non-const bindings
export var Foo = class Foo {};
export var bar = 42;

import.meta.hot.accept(async ({ module }) => {
  /* These don't work: */
  // Object.assign(await import(import.meta.url), module);
  // Object.defineProperties(await import(import.meta.url), Object.getOwnPropertyDescriptors(module));

  ({ Foo, bar } = module);
});
JoviDeCroock commented 3 years ago

Well the exports themselves alone aren't sufficient, imagine a scenario where the following happens:

const x = 8;
export const getFoo = () => 1 + x;

Changing the getFoo isn't sufficient for this scenario, we would need to replace the entire module in this case.

developit commented 3 years ago

Replacing the export to point to the new module instance should be sufficient - getFoo becomes a reference to the new module's exported getFoo, which has its own copy of x. For preserving/modifying state, that seems like a concern that would be external to module cache invalidation.

There is a case where this is fully broken though, which is when exports are added in an updated version of a module that were not previously defined.

GeoffreyBooth commented 3 years ago

couldn’t the issue here be reframed to “it is not possible to update a module’s exports after it has been imported”?

There’s a loader built by @giltayar around this principle, that appends unique timestamps to specifiers so that psuedo-HMR can work. The issue is that the no-longer-needed earlier versions of such modules are never removed from memory, and so over the course of hours while you’re working (and hot-reloading the same module over and over and over) Node will eventually run out of memory and crash.

bmeck commented 3 years ago

The issue is that the no-longer-needed earlier versions of such modules are never removed from memory, and so over the course of hours while you’re working (and hot-reloading the same module over and over and over) Node will eventually run out of memory and crash.

We can't really make an assumption that HMR won't leak. Even in CJS it leaks.

GeoffreyBooth commented 3 years ago

We can't really make an assumption that HMR won't leak.

Fair, but at least that's accidental. The query string approach guarantees eventually running out of memory.

WebReflection commented 3 years ago

quick one about the whole graph:

The hard part is not exposing the private module wrap interface and wanting to provide dependency graph metadata for clearing ancestors.

to have at least parity with CJS, when I delete require.cache[require.resolve('../path')] in CJS, it doesn't invalidate its required modules, neither relative paths nor installed.

accordingly, I don't think invalidating the whole subtree is needed, or even desired, or at least I could deal with the cache exposed the way CJS does.

TimDaub commented 3 years ago

quoting from @developit's suggestion:

couldn't the issue here be reframed to "it is not possible to update a module's exports after it has been imported"?

While I think it's a valid point to ask for simplification of the matter, I still disagree with doing so. I think the point here is that node users are expecting a cache dictionary at require.cache that can simply be deleted by using the delete keyword, so unsurprisingly they do expect the same functionality when the module system is upgraded (esm import). Disregarding all the discussions around standards etc., the most reasonable way of fixing the issue to me would hence be to introduce a cache dictionary on import and allow the user to delete entries. Surely there has already been put a lot of thought into require.cache when it was implemented in node.

While adding a query string may be a valid workaround to re-import modules, it is logically something different than clearing a specific part of a cache. As others have noted, it can additionally lead to memory leaks.

IMO, any functionality that goes beyond that (e.g. fancy hot module replacement) is a specific technical vision and should be handled separately.

WebReflection commented 3 years ago

introduce a cache dictionary on import to and allow the user to delete entries

strong agree with you, but require.cache was likely born in times Map was not a thing, so that I personally wouldn't mind if the import cache is exposed as map, just to have it aligned with the more-modern JS it's representing.

import.meta.cache.delete('../path.js');
devsnek commented 3 years ago

fwiw require.cache was cited during the design of ESM as one of the motivations for an immutable cache. @TimDaub require.cache was more or less added at the whim of one of the early designers of node. I'm also curious if your specific use case is HMR.

guybedford commented 3 years ago

If digging up past arguments here, then it's worth noting that a mutable registry map was always an original design goal for ESM loaders - https://whatwg.github.io/loader/#registry-constructor.

bmeck commented 3 years ago

@guybedford correct, but that was not pursued to completion due to various issues and some of those original participants are on the calls mentioned above.

vsnthdev commented 3 years ago

So, there's no way to implement hot reloading during development while using ESM modules? 😥

I tried adding a query parameter with random string to workaround the cache, but I can do it for multiple files, I can only do for the index file, since I'm trying to build a library that provides hot module reloading. Any help would be greatly appreciated 🙂

bmeck commented 3 years ago

@vasanthdeveloper you can look at https://github.com/snowpackjs/esm-hmr and/or https://dev.to/giltayar/mock-all-you-want-supporting-es-modules-in-the-testdouble-js-mocking-library-3gh1 for different approaches to do this. Note that loaders remain unstable though.

JoviDeCroock commented 3 years ago

We have gotten esm-hmr to work in the browser all though this is a pretty "unconventional" approach in the sense that the original Module will actually stay in the browser.

As you can see in esm-hmr on first serve we'll attempt at creating a moduleGraph Map which lists a module with it's dependencies, dependents and whether or not it has a module.hot.accept in the code.

When an update happens to a module that accepts updates we'll fetch said module appended with a ?mtime=x query parameter, this means that at this point we'll get the new module in-memory but we can't just inject it into the currently in-use module, so frameworks currently write code to hot-replace these. Prefresh being one of these. This code will have logic for a specific framework, in this case Preact, to hot-replace a Component.

When a child updates that doesn't accept its own updates we bubble up and treat parents that do accept updates as boundaries for updating these children. When we encounter such a boundary we'll have to deeply rewrite the imports for the path leading up to said child with the same technique of ?mtime=x.

The tricky part presents itself when subsequent updates happen as explained here

When we update Counter.jsx, the child of Internal.jsx we do the following implicitly import Counter.jsx?mtime=Date.now() this will make our babel-functions re-run and register the new type for Counter.jsx. For in place updates this works great, but when we now update Internal.jsx we'll be importing the old variant of Counter.jsx since during esm-hmr we have no way to update this ModuleRecord in place, essentially the Counter.jsx?mtime=Date.now() is orphaned and disposed instantly.

internal.jsx updates with the old reference, without checking for a new one to exist (byte-cache), this makes it render with the initial code rather than the new.

This to sum-up the current less than ideal approach we are taking in the browser to circumvent this caching issue, I know this is the nodejs repository but could be a useful bit of information.

vsnthdev commented 3 years ago

I bypassed ESM caching by making my own loader which appends a random string as a query at the end of each file being imported. Here's the code I made for testing 👇

import { URL } from 'url'

export const resolve = async (specifier, context, defaultResolve) => {
    const result = defaultResolve(specifier, context, defaultResolve)
    const child = new URL(result.url)

    if (
        child.protocol === 'nodejs:' ||
        child.protocol === 'node:' ||
        child.pathname.includes('/node_modules/')
    ) {
        return result
    }

    return {
        url: child.href + '?id=' + Math.random().toString(36).substring(3),
    }
}

To see if this leaks memory, I made a chain of JS files that import, and then continuously changed their contents 1000+ times, and monitored the RAM usage of node and it seems all normal to me. 🤷‍♂️ ESM cache seems to be cleared automatically as soon as no other file imports that particular file.

Before startingnode, I add my loader using the following command 👇

node --no-warnings --experimental-loader ./hot.js src/index.js

Thanks to @bmeck for guiding me 😅

guybedford commented 3 years ago

@vasanthdeveloper module management is bytes not megabytes :) They are certainly not cleared. Try allocating a 1MB string in each module. But yes practically we may be able to go far on just not dealing with GC problems, untill that hits a wall of course.

vsnthdev commented 3 years ago

I'm gonna try adding 1MB string in those JavaScript files. But since it's only for development, I think it's tolerable.

aral commented 3 years ago

Also using the query string method for the in-progress ESM version of JavaScript Database (JSDB), and it’s not tenable due to the memory leak (my append-only JavaScript data files can be hundres of megabytes in size).

I’d love to see Node pave the cowpaths on this and support the query-string method of ESM cache-busting by garbage collecting the previous version of the module when it detects the practice. (I haven’t peeked into the code so this is probably way easier said than done.)

bmeck commented 3 years ago

@aral V8 won't GC the module once it is linked, please note that Node does not modify V8 generally and so the issue of allowing GC of modules is likely better on their issue tracker.

chyzwar commented 3 years ago

Can we raise the issue in V8 to provide C++ API for hot reloading modules?

wt., 23 lut 2021 o 14:58 Bradley Farias notifications@github.com napisał(a):

@aral https://github.com/aral V8 won't GC the module once it is linked, please note that Node does not modify V8 generally and so the issue of allowing GC of modules is likely better on their issue tracker.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nodejs/node/issues/49442, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNWS3ZAQ4WJCYGZD4XV65LTAOYA7ANCNFSM4HD5LYQA .

mattxyzeth commented 3 years ago

@giltayar have you looked into using Workers or other solutions to have a module cache that you can destroy (such as by killing the Worker)?

+100000000 for using Workers. I created a small worker script that received the module path and some arguments (as an object) for the module in the workerData object. The worker uses a dynamic import to load the module and execute it using the arguments. It then just returns the result to the parent. Since the worker executes in it's own module context there is no need to invalidate cache. Works perfectly. ❤️

Node.js 14.x lts

import { Worker } from 'worker_threads'

const worker = new Worker('./module_executor.js', {
  workerData: { modulePath: './path/to/module.js', args: { arg1: 1, arg2: 2 } }
})

worker.once('message', result => {
  console.log(result)
})

worker.once('error', error => {
  console.error(error)
})
import { workerData, parentPort } from 'worker_threads'

async function executeModule({ modulePath, args }) {
  const { default: mod } = await import(modulePath).then(module => module.default)
  // const { mod } =  await import(modulePath).then(module => module.default) //  depending on how you exported

  let result;
  if (mod.constructor.name === 'AsyncFunction') {
    result = await mod(args)
  } else {
    result = mod(args)
  }

  parentPort.postMessage(result)
}

executeModule(workerData)
simlu commented 2 years ago

@vsnthdev No idea why your solution uses so little memory... but it actually works!

PS: I'm currently in the process of migrating lambda-tdd and node-tdd from require to import.

vsnthdev commented 2 years ago

@simlu Thank you 😊

But please refrain from using this in production.

I used to use this technique to primarily have HMR support during development 🙂

simlu commented 2 years ago

@vsnthdev Oh I would never. Just for lots and lots of test suites =)

dword-design commented 2 years ago

Still only invalidates the cache for the imported module, not for the indirect dependencies. So still no switching to ESM 😟.

lehni commented 2 years ago

@vsnthdev I've used your method to implement cache blasting in mocha --watch mode for ESM imports, and it works!

https://github.com/mochajs/mocha/issues/4374#issuecomment-1073140194

vsnthdev commented 2 years ago

@vsnthdev I've used your method to implement cache blasting in mocha --watch mode for ESM imports, and it works!

mochajs/mocha#4374 (comment)

I am glad it helped you 😊

LukasBombach commented 2 years ago

This might just be the dumbest thing to do (and will probably do bad things to node's cache system), but you can add random parameters to your imports to get an uncached version

const myModule = await import(`./myFile?cachebust=${Date.now()}`)

Needless to say that this isn't for a prod system

simlu commented 2 years ago

Here is the hot reload file we use to achieve the balance between cache invalidation and cache reuse to be able to run hundrets of tests without hitting memory limits but also invalidating the necessary files.

https://github.com/blackflux/robo-config-plugin/blob/master/test/projects/assorted/%40npm-opensource/test/hot.js

We invalidate by environment variables and by comment. This requires to add strategic comments to the code to invalidate the necessary files.

For inspiration or for use as is. This took some fine tuning. Do not use in production

renhiyama commented 2 years ago

Ah yes the reason why I wondering how @remix-run 's dev mode had memory leak, turns out its because of cache busting... Btw may I know the reason why cache removal is not coming to nodejs' esm? I cant even differentiate whether nodejs focuses on browser spec like deno, or server side stuff... Why talking about deno here? Because they have import mappings, and here we cant even get unloading cache, like its not that dangerous? Pretty sure its going to be used in dev mode and not in production. Like come on, dont tell me hackers will hack via this method! There are literally a lot of ways to easily hack a pc if hackers can get their hands on them!

renhiyama commented 2 years ago

Cant we just keep it simple as it was before? Workers need a lot more lines + complexity unlike deleting require cache 😄

zackschuster commented 2 years ago

@renhiyama essentially, v8 is in control of the module cache & node has no way to interact with it. i think there was some discussion around getting v8 to add an api, but it's quite an uphill trek & it's not likely to be undertaken vigorously when cache-busting can be achieved in userland via the workaround provided above.

renhiyama commented 2 years ago

@zackschuster but cache busting creates a memory leak since the old ones are not wiped out unless using workers? (or maybe even using workers, I didnt test workers till now since it looks complex than clearing require cache)

timiyay commented 2 years ago

Does the experimental ESM Loader Hooks API, released this week in NodeJS 18.60, provide any new avenues for solving this problem? https://github.com/nodejs/node/releases/tag/v18.6.0 https://dev.to/jakobjingleheimer/custom-esm-loaders-who-what-when-where-why-how-4i1o

zackschuster commented 2 years ago

but cache busting creates a memory leak since the old ones are not wiped out

hot reloading in a browser has the same issue, incidentally.

unless using workers? (or maybe even using workers, I didnt test workers till now since it looks complex than clearing require cache)

i think workers have ties to the FinalizationRegistry that help ensure they get cleaned up, but don't quote me on that.

renhiyama commented 2 years ago

I like how chrome doesn't even try a single thing to lower the ram usage, not even providing alternative methods so memory leaks doesnt happen...

rektide commented 2 years ago

This cowpath should be paved. The original intent of shipping a Module Registry for EcmaScript Modules was a good one, that recognized a valid need very very widely faced. Trying to un-ship this capability denies us what we need. Give the users what they want.

Does the experimental ESM Loader Hooks API, released this week in NodeJS 18.60, provide any new avenues for solving this problem? https://github.com/nodejs/node/releases/tag/v18.6.0 https://dev.to/jakobjingleheimer/custom-esm-loaders-who-what-when-where-why-how-4i1o

Vsnthdev's's workaround seemingly uses the programmable esm loader hooks to redirect every request to a new uniquely named request, which seemingly works ok.

aral commented 2 years ago

Does the experimental ESM Loader Hooks API, released this week in NodeJS 18.60, provide any new avenues for solving this problem? https://github.com/nodejs/node/releases/tag/v18.6.0 https://dev.to/jakobjingleheimer/custom-esm-loaders-who-what-when-where-why-how-4i1o

Nope, sadly not.

Also, if the officially-linked article (see end of https://dev.to/jakobjingleheimer/custom-esm-loaders-who-what-when-where-why-how-4i1o) for how to do ESM cache invalidation with Node.js (https://dev.to/giltayar/mock-all-you-want-supporting-es-modules-in-the-testdouble-js-mocking-library-3gh1) includes (count them) four “sneaky!” exclamations while describing the method, that’s a smell.

Not mentioned in that article: that the method leaks memory.

Sneaky!