Closed reasonablytall closed 1 year ago
is there a simpler term for "attenuated"? … discussion gets drowned a bit in "big words".
Specifically attenuated
here has a distinct concept which I myself only recently was introduced to… from object capabilities (ie SES/Realms weekly meeting) describe this as the process of taking something with a greater degree of authority than necessary for the consumers to function correctly and limiting it to exactly what is needed for security (ie rouge code).
So as @bmeck describes, it functionally is creating some customized view of (altered maybe a good suggestion imho to make it the alternative of) the module, but an important aspect for attenuation is that it presumes that the realm/compartment would not still somehow refer to the original module's namespace
Note: I think this is important enough in that it is different from how other altered modules might generally still coexist in the respective container(s) while attenuated ones should not.
I don't understand this, could you expand?
Right, let me try. We're aiming for the following module graph I assume:
+--------------------+ "fs" +--------+ $x +--------+ $y +-----------+
| file:///client.mjs | ------> | $wrapB | ----> | $wrapA | ----> | nodejs:fs |
+--------------------+ +--------+ +--------+ +-----------+
The idea would be that $wrapB would be marked as "owned by loader B". Which means that only the loader stack above B would run (since it wouldn't run on itself). That way - in theory - we would have an exit condition where the loader stack gets smaller the closer we get to the "real" fs.
Specifically attenuated here has a distinct concept
Okay, so that sounds like it's a very specific kind of instrumentation code / narrow set of goals. I'm not sure if it's necessary to use it as our only example here since the issue doesn't seem specific to this kind of instrumentation code.
Let's be careful not to be too specific unless it actually adds important constraints and if we do, we should be specific about the constraints and how they apply to the problem at hand. :)
The idea would be that $wrapB would be marked as "owned by loader B". Which means that only the loader stack above B would run (since it wouldn't run on itself). That way - in theory - we would have an exit condition where the loader stack gets smaller the closer we get to the "real" fs.
In the design document we have a success criteria of having 2 different loaders instrumenting fs.readFileSync
. If I understand the comment above it would prevent 2 loaders from interacting on an "owned" location?
If I understand the comment above it would prevent 2 loaders from interacting on an "owned" location?
The "owned" location is the module generated by/injected by the loader, not fs
itself. So in that diagram, the import chain should ensure that $wrapA
gets to patch readFileSync
, $wrapB
sees that patched version, can patch it again, and client.mjs
gets the double-patched version. Maybe I'm missing something about the requirements?
@jkrems I was not understanding, I think I'm getting confused on what $wrapA
does differently from $wrapB
. $wrapA
sees the true builtin (no source), but $wrapB
modifies the source result of $wrapA
? Or does $wrapB
also see an opaque source?
This is fundamental to all the "safe modules" work happening… so imho, this is important here @jkrems — there is no way to secure the module system if it only approximately provides attenuation guarantees but not quite.
To elaborate the concept, considering the more abstract container
concept, which could be an SES realm or "safe compartment" in that you can overload module identifiers by remapping them opaquely, and so every single module instantiated in that container importing the specifier only ever receives the remapped module and not the original.
With that attenuated:node:fs
remapped to node:fs
would effectively never also expose the original node:fs
and this means:
technically the module map of the container only ever has one node:fs
record, but,
attenuation happens in an elevated way where the mappings do not affect attenuated:node:fs
itself which still gets to access the original node:fs
.
Note: the assumption is that import … 'attenuated:node:fs'
would not be something that would take place in the actual container — but if it did, I am inclined to say that would be a completely different instance, a somewhat pointless one regardless of how you slice it due to redundantly attenuating the attenuated node:fs
namespace.
Does that help?
@bmeck Ah, let me clarify: Nothing in my example is modifying any source code. $wrapA
is a full module that has an import statement with the specifier $y
and then re-exports a patched version.
Does that help?
I'm not sure. It feels like something that already is the case with all loader designs discussed here since loaders control all specifier resolution(s). So I'm not sure what you're trying to say by bringing it up..? I'm also not sure what realms bring to this..? Maybe it would help if you expressed if there's a specific gap that is connected to realms or containers or if it's just an OT aside to elaborate on SES concepts..?
Maybe it would help if you expressed if there's a specific gap that is connected to realms or containers or if it's just an OT aside to elaborate on SES concepts..?
Fairly certain that while "specific gap" is the more concrete thing to aim for, yet "something possibly being glossed over in details not explored" can be equally fatal a mistake in the design process. That said, fresh eyes might help articulate things better.
(below I use alterer to be a Loader that attenuates or instruments or in some other way modifies a parent loaded module by returning new module content that imports from the parent)
@jkrems if I'm understanding your "owned module" idea correctly, it stops both Loader B and Loader A from handling imports in $wrapA
. This is fine when both A and B alter the same builtin, but has the side-effect of preventing B from wrapping a different builtin used in $wrapA
. I'm not entirely sure if there are real use cases that would be impacted by this, but it does break the expectation that a builtin alterer should alter all uses of the builtin except its own.
An example:
node:os
to write all calls to a log file (using node:fs
)node:fs
also to write all calls to a log file (using node:fs
)node:os
returns $wrapA
$wrapA
skips Loaders B and A because it's owned by A$wrapA
's node:fs
is the true builtinnode:os
calls is written and correctnode:fs
calls is written, but is missing the calls to write a logfile made in $wrapA
@A-lxe @jkrems @bmeck do you guys think it is possible to discuss this?
I myself appreciate how tricky it is and how much effort everyone of you is doing, and while I am technically taking time off, this specific aspect of hooks for remapping altered modules keeps dragging out.
Before this, the only model for altered modules that I came across — aside from the good'ol require goodness — has been the idea of making a compartment…
That said, if the aim is to alter modules systematically by chaining and ownership, and to do so absolutely by not changing the module map for the particular module id (ie node:fs
) then I am certainly feeling a little behind visualizing this myself even with the very clear efforts you are putting in communicating here (my apologies).
I can set us up on zoom, but can someone fire up a doodle with more practical times?
I'm thinking 8am to 8pm CEST which currently works for me being likely not practical for everyone else — and I would not want to hold back having this discussion happening.
I've made a doodle poll with some times that hopefully should work for each of us:
https://doodle.com/poll/g2chvakibfky7drt
Tell me if you think I should add times or anything like that!
It looks like Monday 9-10am PDT (6-7pm CEST) is a good option. I've made a calendar event here.
@SMotaal could you send me a link to the zoom when you set it up? I can add it to the event (or add you as an editor with your email).
I just ended up creating a recurring ad-hoc meeting for discussions and posted the link to the team's discussions.
Please let me know if you have trouble viewing it.
I get a 404 for that link. Does #372 need to be merged?
@A-lxe I sent you the link on hangouts
Based on discussion in #386, to prevent blocking unflagging esm we should create a flag that enables the current (and future) experimental --loader
feature.
I propose adding an --experimental-loaders
flag that enables the use of the --loader
option.
I also figure that --experimental-loaders
shouldn't be required when --experimental-modules
is set, thereby offsetting the need for a user to set --experimental-loaders
until esm is unflagged. I'd appreciate thoughts on that.
I'll have a PR for this by Friday, given no objections.
@A-lxe SGTM. There is a way to infer one flag from another in the flag parser
What about renaming --loader
to --experimental-loader
so that it is still only one flag, but clearly experimental?
Alternatively, what if we simply retain the --experimental-modules
flag for features that are still experimental like loaders?
Alternatively, what if we simply retain the
--experimental-modules
flag for features that are still experimental like loaders?
i think the naming would confuse people. i'd suggest instead of this a new flag like --experimental-module-features
with char-delimited positional values if you want to selectively enable features (e.g. --experimental-module-features='loaders,feature1'
)
Should we separate --loader-v0
/--deprecated-loader
from --experimental-loader
? So that we can distinguish between people trying to use the current flag from people trying out the next iteration? Or is the idea to change the name as we land the beginnings of the new loader hooks?
For bash scripts wrapping Node.js initialization with a loader, it seems like an error fallback will be needed to properly handle loader support.
The more flag options there are, the slower the startup will be on older Node, and it's not an insignificant 300ms to do this.
Ideally I would prefer if we could try to make a goal of enabling at least two basic inits to get Node.js running with a loader. At the moment we're looking at three though:
node --loader
(when loaders are stable)node --experimental-loader
(modules stable, loaders unstable)node --experimental-modules --loader
(current shipping implementation)(by three startup is now 1 second in total before JS bootstrapping)
(getting the above to two is my justification for suggesting loaders under --experimental-modules
, but I can appreciate if this goal is not prioritized as well)
we shouldn't have multiple versions of experimental things at the same time. we explicitly created experimental flags to allow us to develop a system without worrying about releases/versions/etc.
we explicitly created experimental flags to allow us to develop a system without worrying about releases/versions/etc
The aspect of an --experimental
flag that allows not worrying about versions is that any break is permitted at any time under the flag. So changing the loader API or any other modules API completely while still continuing to use an --experimental-modules
flag exactly could be ok.
i'm just referring to stuff like --loader-v0
, and needing scripts that try a bunch of node options. a script shouldn't ever do that because if --loader
isn't directly available anything else it might try would be an experimental feature with a potentially different api.
i'm just referring to stuff like --loader-v0
To clarify the intent: I wasn't suggesting that both flags co-exist. I was suggesting that the current implementation would move behind that flag asap to give a strong signal that things are about to break if people keep using the flag and aren't changing their code. In that scenario, the flag would be replaced with a new flag the moment the revised loader hook API lands. My hope was to reduce confusion around the time when we revise the loader API.
thanks for clarifying. in that case, we can just use a --experimental-loader flag, and replace it with --loader when it goes stable.
So for the past few weeks I’ve been trying to implement a CoffeeScript loader, and I’ve pretty much given up in frustration trying to make it work with the existing API’s resolve
hook and data URLs. This experience led me to implement https://github.com/nodejs/node/pull/30986, to add two new hooks I called getSource
and transformSource
. Unlike resolve
, these are very narrowly scoped: getSource
allows the user to provide a function to override Node’s reading a file from disk, to potentially instead get file contents from a memory cache or an HTTP call or somewhere else; and transformSource
allows the user to do something with that source just after it’s been loaded but before Node does anything else with it, for example to transpile it.
Neither of these hooks are described at all in the design doc linked above. To be honest I couldn’t quite figure out how I was supposed to convert that document into code; what would help me is a document that shows the theoretical API as it would be written as documentation in https://nodejs.org/api/esm.html#esm_experimental_loader_hooks.
My feedback on the current API is that resolve
is too broad. In the current API, building any type of loader requires providing a function for a new resolve
hook, but any resolve
hook requires a lot of boilerplate because resolve
is doing so many things: converting a specifier into the URL to load, deciding its format, deciding what file on disk to load for that URL, etc. As a user I would prefer that these steps be broken up into several smaller hooks rather than this one big one, so that for example I can implement automatic extension resolution without also needing to determine format.
So in the near term, if I were to do another follow-up PR that would be my next improvement. I know others (@guybedford @bmeck) have grander plans for more dramatic redesigns of the API; I’d love to hear what those would be, in any level of detail. To the point of “why should we add new hooks on a bad API that we’re planning to redesign,” well, a) I don’t know when the redesign will come if it ever does, and b) I think there’s a lot of value in seeing how the use cases served by the current API can still be served in the refactored API, and what the DX looks like in one versus the other. We would get that diff by incrementally improving what we have until someone opens a PR with the redesign.
Is there a library that exposes the functionality of node's built-in resolver as a class, or something akin to a class, allowing parts of the implementation to be tweaked?
I am implementing modules support for ts-node. We need to implement a resolver that is almost identical to node's built-in resolver, except we need to match a few additional file extensions, (.ts and .tsx) and we need to detect when a .js
import specifier is meant to load a .ts
file on the local filesystem.
Ideally, the built-in resolver exposes an API that allows customizing its behavior. Exactly what can be customized and what cannot is up for debate.
We would also benefit from programmatic access to the --experimental-specifier-resolution
flag without parsing it out of process.execArgv
.
Are there any projects that attempt to extract node's built-in resolver, allowing it to be extended?
If it were exposed in some way, I could probably override the hypothetical extensions()
and resolveExtensionsWithTryExactName()
methods to achieve what we need.
Currently, I've copy-pasted node's resolver into our codebase, wrapped it in a factory function, and tweaked as necessary. https://github.com/TypeStrong/ts-node/blob/ab/esm-support/dist-raw/node-esm-resolve-implementation.js
@cspotcode https://npmjs.com/resolve, for CJS, but i haven't yet added ESM resolution to it (although i absolutely will be, soonish).
@cspotcode you're the first to extract the JS implementation here that we've heard from - given that it would be a huge service to the community to publish that work or collaborate with @ljharb on getting it in resolve.
The extended resolver APIs aren't clear for Node.js itself - in terms of what options to make available. And opening up every hook in the resolver is contrary to the needs of Node.js core itself. If there was a solid complete proposal for a hookable resolver API that we can use internally and also expose that might be something to consider, but would require a sound proposal, and the bar for that would have to be quite high.
@cspotcode
I am implementing modules support for ts-node. We need to implement a resolver that is almost identical to node's built-in resolver, except we need to match a few additional file extensions
Why would you need such a resolver? If you get a ts file to transpile, your import
-s would be either bare-specifiers, in which case you need to leave it as is, or relative paths that are fully specified to other js
or ts
files, in which case, again, you need to leave it as is.
To make things clearer. If you have:
import foo from './relative/path/to.ts'
import _ from 'bare-spec'
Why would you need to change those imports? Just let Node.js do it's thing, and the next time you will get to.ts
to transpile, no? And in the case of the bare-spec
, it will already be a JS node module, so there's no need to even transpile it.
In the source, it wouldn’t have the extension, based in my understanding of existing TS usage.
In the source, it wouldn’t have the extension, based in my understanding of existing TS usage.
I think this is just based on common patterns, though, right? Users could write './file.ts'
instead of './file'
if they wanted to, I think (@cspotcode?). And arguably it wouldn't be a bad habit to get into, as it removes a bit of magic and makes it clearer what's going on (and more closely resembles the JavaScript that will eventually run, in either Node or browsers).
If TypeScript wants to implement automatic extensions as part of their compilation (i.e. transpiling './file'
into './file.mjs'
or './file.js'
as configured), they're welcome to; that would be better than relying on Node for that, as TypeScript's version could then apply everywhere including browsers.
I think this is just based on common patterns, though, right? Users could write
'./file.ts'
instead of'./file'
if they wanted to
Results in the following error: TS2691: An import path cannot end with a '.ts' extension.
"Accepted" workaround is to use import './file.js'
, which looks weird when what you're actually importing is .ts
.
https://github.com/microsoft/TypeScript/issues/18442 has a whole bunch of discussion
@giltayar just as node has 2 resolver modes, --experimental-specifier-resolution=<explicit|node>
, we also plan to implement 2 resolver modes:
explicit
We must resolve foo.js
to the following files in disk: foo.ts
, foo.tsx
, or foo.jsx
. TypeScript supports this so that developers have explicit control over the emitted module specifiers.
node
In addition to the above, we should allow developers to omit file extensions. This is motivated partly because the default behavior for Typescript's import quick-fix is to omit the file extension.
The language service and code editors do allow users to configure this, so the quick-fix import statements include a file extension. However, if users want to stick to the default and opt-in to a mode akin to --experimental-specifier-resolution=node
, we can support it.
Results in the following error:
TS2691: An import path cannot end with a '.ts' extension.
"Accepted" workaround is to useimport './file.js'
, which looks weird when what you're actually importing is.ts
.
I remember discussing this before, probably on that thread you cite. I think what I suggested then is what still seems to make the most sense to me now: TypeScript should start supporting './file.ts'
, and output this as './file.mjs'
or './file.js'
depending on configuration. Especially if it currently errors, that seems like an easy change to make.
@GeoffreyBooth afaik the last stated position by the typescript team was that "...we do not believe the compiler should rewrite your imports. module names are resource identifiers, and the compiler should not mess with them"
That comment was from 2016, and a lot has changed since then. TypeScript also has a lot of configurable options, so I don't see why this couldn't be one of them.
Regardless, you could achieve the same either through a latter build step (like a Babel plugin) or via a custom Node loader.
@cspotcode I understand you need it for the "node" resolution module.
Not sure I understood what you said in "explicit" mode:
We must resolve foo.js to the following files in disk: foo.ts, foo.tsx, or foo.jsx. TypeScript supports this so that developers have explicit control over the emitted module specifiers.
Shouldn't foo.js
in explicit mode resolve to... foo.js
? Isn't that what explicit mode means?
foo.js
resolves to foo.js
, which is the compiled output of foo.ts
and does not exist on disk prior to compilation. ts-node
runs prior to compilation.
TypeScript does not modify valid ES syntax unless explicitly asked to. JSX is transformed because it is not valid ES. Type annotations are removed because they are not valid ES. import from './foo.js';
is emitted verbatim, because it is valid ES syntax. (it will be transformed if we ask TS to convert to CommonJS or downlevel to ES3) At design time, the language service understands the semantics, because it knows ./foo.js
is the emitted code corresponding to ./foo.ts
, ./foo.tsx
, or ./foo.jsx
.
This gives developers explicit control over their import module specifiers at runtime, and those specifiers can remain stable for certain changes to the codebase. For example, renaming a .js
file to .jsx
does not requiring changing the import statement. Refactoring .js
into .ts
, or vice versa, is the same.
ts-node
's goal is to be a drop-in replacement for the compile step. If you would normally tsc && node ./index
, we allow you to ts-node ./index
. We also need to maintain compatibility with the larger ecosystem, in particular the language service, because that is the value of TS.
Will there eventually be the ability to install loader hooks at runtime? Currently, CLI tools that need to bootstrap an execution environment which includes hooks have a tough time doing so in a cross-platform manner that doesn't impose extra performance overhead.
It's possible for one node
process to spawn another, but there are lots of caveats with that, and it's slower.
My use-case is ts-node
. Our normal interface is ts-node scripts.ts
. This is compatible with Linux shebangs.
Today we need to tell users to node --loader ts-node/esm script.ts
. Ideally, users can ts-node script.ts
, which launches one and only one node process, and we install hooks at runtime.
@cspotcode I'd suggest using ts-node
to spawn a new Node.js process with the loader hooks set. There likely will be APIs to spawn subloaders in future in the same Node.js process, but that work has not yet begun. Contributions to loader work is also welcome.
I have another question / bit of feedback. I hope this is the right place to post. Technically it pertains to something require.extensions
hooks must do to properly integrate with ESM.
node's built-in require.extensions['.js']
implementation checks if the file should be treated as ESM. If so, it throws an error.
ts-node
attaches a custom require.extensions['.ts']
(and .tsx , .jsx, and overwrites the built-in .js hook as needed). We read the file from disk, compile .ts->.js
, then pass this string to module._compile
.
Our hook needs to mimic the error-throwing behavior of node's built-in hook. For example, mocha relies on this behavior to correctly support both CommonJS hooks and ESM hooks simultaneously: https://github.com/mochajs/mocha/blob/master/lib/esm-utils.js#L10-L23
require.extensions['.js'].toString()
> require.extensions['.js'].toString()
'function(module, filename) {\n' +
" if (filename.endsWith('.js')) {\n" +
' const pkg = readPackageScope(filename);\n' +
" // Function require shouldn't be used in ES modules.\n" +
" if (pkg && pkg.data && pkg.data.type === 'module') {\n" +
' const parentPath = module.parent && module.parent.filename;\n' +
" const packageJsonPath = path.resolve(pkg.path, 'package.json');\n" +
' throw new ERR_REQUIRE_ESM(filename, parentPath, packageJsonPath);\n' +
' }\n' +
' }\n' +
" const content = fs.readFileSync(filename, 'utf8');\n" +
' module._compile(content, filename);\n' +
'}'
My plan right now is a hack where I invoke the built-in .js
hook, passing it a filename I know does not exist, but that is in the right directory. This costs a failed fs
call:
try {
require.extensions['.js'](
// make `module` object
{_compile(){}},
filename + 'DOESNOTEXIST.js' // I can make this more robust, appending a random UUID
);
} catch(e) {
// Inspect the thrown error to see if it's an `ERR_REQUIRE_ESM` error
}
EDIT: actually, won't be using this hack. In the case another require hook has been installed before us, we can't be sure require.extensions['.js']
is going to be node's built-in hook. Instead I'm going to extract the relevant code from node's source into our codebase.
Ideally node exposes its synchronous CJS/ESM classifier to support our use-case. I realize that require.extensions
has been deprecated for years, but that doesn't strictly prohibit node from exposing this API.
This has been something I've been thinking about as well. I think for a loader to handle both ESM and CommonJS files, it also needs to hook into require.extensions
(at least for now). I'd like to proxy/override or wrap require.extensions['.js']
for extensions that should be handled similarly, but without necessarily having the “throw if this is ESM” check. I wonder if there's a not-terrible way to prevent that, aside from copying the source of require.extensions['.js']
into my code.
Another thing we might want to consider is making the ESM loader hooks simply the loader hooks, to apply to both CommonJS and ESM, finally replacing the deprecated require.extensions
. I haven't given thought into how that would work in practice, but there are lots of use cases such as instrumentation where the loader should affect all files, not just ESM ones, and it's more work for loader authors to hook into both of Node's loaders in very different ways.
@GeoffreyBooth the problem with unifying the hooks is that require()
must be synchronous.
Right now, when a user installs ts-node's ESM hooks, we know they can't have installed any other hooks. Depending how you think about, we are responsible for implementing some basic features that would ideally be implemented by third-party ESM hooks.
If a third-party library installs require.extensions['.coffee']
, for example, do we need special logic in our hooks to resolve()
.coffee
files and getFormat
them the same as .js
files? We cannot pass them to defaultGetFormat
because it doesn't understand .coffee
files.
the problem with unifying the hooks is that
require()
must be synchronous.
Hmmmm. That is a problem. @jkrems or @weswigham is there any hope here, or would hooks that work with CommonJS run into the same issues that Wes’ “require
of ESM” PR did?
I suppose one could write synchronous hooks? CoffeeScript transpilation is synchronous, for example; I dunno if TypeScript’s is? Obviously there couldn’t be a sync HTTP loader but there are plenty of useful loader cases that don’t need async.
Anyway applying these hooks to CommonJS as well is a long-term maybe goal. AFAIK CommonJS wasn’t designed to be hookable/customizable, and the current ways people do it are monkey-patched hacks more or less. Perhaps it’s best left as is.
Hooking into the dependency loading steps in Nodejs should be easy, efficient, and reliable across CJS+ESM. Loader hooks would allow for developers to make systematic changes to dependency loading without breaking other systems.
It looks like discussion on this topic has died down, but I'm really interested in loader hooks and would be excited to work on an implementation! There's of prior discussion to parse through, and with this issue I'm hoping to reignite discussion and to create a place for feedback.
Some of that prior discussion:
require()
edit (mylesborins)
here is a link to the design doc
https://docs.google.com/document/d/1J0zDFkwxojLXc36t2gcv1gZ-QnoTXSzK1O6mNAMlync/edit#heading=h.xzp5p5pt8hlq