Open joyeecheung opened 1 week ago
cc @merceyz @jakebailey @H4ad from https://github.com/nodejs/node/issues/47472
It's also common for library/framework authors to want to enable this in a more flexible manner.
Why? What sort of flexibility would the library/framework need that the environment variable doesn't provide?
This sounds great; supporting a default location in a reasonable location is super helpful.
Is node:module
the right place for this? Or is node:v8
actually where it might be?
after this method is called.
All-in-all, I'm not sure how I feel about being unable to use this without using CJS or TLA; if an executable wants to enable caching of itself, it needs to have an extra entrypoint which only serves to enable the caching and then load the other code. Or, fork, which is slow. Not sure that one can do better, though. The call has to happen somewhere...
I guess this is exactly how v8-compile-cache works? (Not familar with its implementation but I guess it must have the same restriction...)
Why? What sort of flexibility would the library/framework need that the environment variable doesn't provide?
If you want to enable caching today, you have to set the environment variable. This means that applications which want to enable it for themselves have to fork a new process to enable it, defeating the speedup.
Is node:module the right place for this? Or is node:v8 actually where it might be?
I used node:module
off the top of my head but node:v8
sounds reasonable too. I am slightly leaning towards node:module
because this only applies to the user modules loaded by the usual module loading process (so if the user compiles some module differently via vm APIs, this won't apply, at least not automatically).
All-in-all, I'm not sure how I feel about being unable to use this without using CJS or TLA
You could also --import
an ESM that does this call synchronously.
if an executable wants to enable caching of itself, it needs to have an extra entrypoint which only serves to enable the caching and then load the other code.
Yeah I think there is a general lack of way for libraries to "define something to be run before everything else, without the use of command line flags, or environment variables". It was also raised in the module loader hooks discussion (https://github.com/nodejs/node/issues/52219#issuecomment-2061740553). IMO we need to figure out a way to allow developers/users to specify code that needs to be preloaded for every/some process/worker. But some configuration needs to happen - perhaps some magic field in package.json is a good place for it to be done, but that would probably be a separate topic.
Spinning from https://github.com/nodejs/node/pull/52535#issuecomment-2059390083
Currently, the built-in on-disk compilation cache can only be enabled by
NODE_COMPILE_CACHE
. It's possible for the end user to control where theNODE_COMPILE_CACHE
is stored and so that it's also possible for them to find the cache and clean it up when necessary. That's the simplest enabling mechanism for sure, but from the use cases of v8-compile-cache (a package that monkey-patches the CJS loader, which is a capability that we want to sunset, see https://github.com/nodejs/node/issues/47472). It's also common for library/framework authors to want to enable this in a more flexible manner. So this issue is opened to discuss what an API for this should look like and what the directory structure of the cache should look like.With the global
NODE_COMPILE_CACHE
the current cache directory structure looks like this:For reference
v8-compile-cache
's cache directory looks like thisAnd inside the .BLOB files it maintains a
module_filename + sha-1 checksum -> cache_data
storage. In the documentation it explains:In my investigation when implementing NODE_MODULE_CACHE though, there's actually not much performance difference in reading on a file-by-file basis, at least when it's implemented using native FS calls and when the file only gets loaded when the corresponding module is about to get compiled (so not all the cache is loaded into the process at once even though the module might not be needed by the application at all - which
v8-compile-cache
does).For third-party tooling (e.g. transpilers, package managers) I think the layout that don't distinguish about entrypoints would still be beneficial - as long as the final resolved file path remains the same and its content matches the checksum, and it's still being loaded by the same Node.js version etc., then the cache is going to hit. Then if multiple dependencies in the same project try to enable it, we wouldn't be saving multiple caches on disk even though they are effectively caching the code for the same files (e.g. the end user code needs package
foo
that resolves to/path/to/foo.js
, whose cache gets repeatedly stored in the cache enabled by a transpiler and then again in the cache enabled a package manager that executes a run command).I wonder if we should just provide the following APIs:
process.getCompileCacheDir()
would still allow end users to find and clean stale cache to release disk space. We could probably also add a file to the designated directory with a name that's easy to find (e.g.$CACHE_DIR/node_compile_cache_mark
) to facilitate this too.In most use cases, tooling and libraries should simply call
module.enableCompileCache()
without passing in an argument so that the cache is stored in tmpdir and can be shared with other dependencies by default, and end users can override the default cache directory location withNODE_COMPILE_CACHE
. Some more advanced tooling/framework might want more advanced customizations and use their own cache directory, then they can specify it.Some more powerful APIs are probably needed to allow advanced configuration of the cache storage, but at least the APIs mentioned above would address the use cases of existing
v8-compile-cache
users. For the more power API, it would be difficult to just think of one that works well without some collaboration with adopters, so ideas welcomed regarding how that should look like :)