Closed JakobJingleheimer closed 2 years ago
@GeoffreyBooth I feel rather vindicated RE Concerns with Next() #3: When I was setting up the resolve
hook test fixture for #37468, I indeed did forget the default-case return next(…)
at the end. Took me a few minutes to figure out why the heck suddenly all my cases were failing.
I think if we opt for the next()
approach (as I was authoring #37468, I'm increasingly leading toward "shouldn't", not least of which because the implementation looks to be much more difficult—read: buggy, real or perceived), I think some extra fault detection will be needed. Without, I think this could be a table-flip moment for users (I was getting annoyed, and I wrote the darn thing—so if anybody should know/remember, it would be me (and you)).
This is worth discussing, sure, but I didn’t think https://github.com/nodejs/node/pull/37468 was supposed to implement chaining; I thought it was just meant to combine the hooks and stop there, and a later PR would add chaining. Isn’t this something to figure out after https://github.com/nodejs/node/pull/37468 has merged in?
Yes #37468 isn't/doesn't, and yes that's all #37468 does. #37468 is ready to go, so I think this is next?
One point of concern about the Done method: when implementing a loader for mocking (as I did for testdouble's ESM support), you need to add cache busting to the final url, which means that you want to be the last in the chain. And yet, when loading the source, you want to be the first in the chain, because you know you're not really transforming the code, but rather replacing it.
This is why I prefer the Next() approach. While it means that loader developers need to be more aware of how the chaining works, it gives them more flexibility in how they implement the resolve and load functions of the loader, and does not "chain" them to the opinionated thinking of how things should happen, which is what the Done approach is trying to do.
Exhibit A 😊, the ESM mock loader for testdouble: https://github.com/testdouble/quibble/blob/main/lib/quibble.mjs
@giltayar sorry, I'm not following (but would like to understand your concern). done()
would ensure it's the last hook executed. If you mean out of all possible hooks, all other hooks need to run and a particular one dynamically needs be last in queue, yes, next()
would facilitate that and done()
would not. Needing to do that dynamically is a critical differentiator but it sounds like very much an edge-case (please do correct me if I'm misunderstanding you).
Regarding needing the particular resolve
hook's sibling load
hook to be called first, both done()
and next()
could do that because ESMLoader will be aware of what file they both came from. But why do you need that?
I looked through your code example (I've looked thru Quibble previously—slick stuff!), but the "why" didn't jump out at me.
At first read, it sounds like what you're describing meticulously strong-arms a process that would probably naturally result in what you want anyway.
If this is an edge-case, unless it's a very compelling edge-case, I would likely lean towards the option that does not ensure all developers have a much more complicated and gotcha-prone experience so that a small subset might use a small bit of extra functionality.
next
allows a single loader to:
a) pre-process the call to the next loader
b) post-processing the results from the next loader
c) skip/suppress the call to the next loader
done
supports 2 out of 3?
If I understand correctly, neither API allows awaiting an entirely different module resolution. For example if the format of ./transpiled-language.*
depends on the contents of package.json
and/or transpiled-language-config.json
, and if the module is coming from the web through an unpkg
, url-remapping
, or http-to-https
loader. However, node's own loader does not use the loaders chain to read package.json
, does it? It assumes they exist on the local filesystem. I'm not sure if that will change, and I'm not sure whether loaders want to be a virtual filesystem of sorts.
done
would not be able to support b
in your list because by the time the subsequent hook is invoked, the current will have finished (I'm not sure if it should do b
). It can do a
and c
.
Currently, node does read the package's package.json, checking its "type"
field.
If you need to orchestrate the sequence of imports such that foo
is available before bar
, a dynamic import might be what you're looking for:
const { default: foo } = await import('foo');
const { default: bar } = await import('bar');
If you're saying that foo
needs to happen during and be available to bar
's load hook, the above may suit with some finagling, but this may be something to use context.conditions
(which is a bit vaguely defined at the moment, but could potentially be extended to facilitate).
Here's roughly how done
would implement the CoffeeScript
and HTTPS
loaders from next
's original proposal:
--loader "./httpsLoader.mjs" \
--loader "./coffeeLoader.mjs" \
--loader "./otherLoader.mjs"
Resulting in httpsLoader
being invoked first, coffeeLoader
being invoked second, etc, and node's internal defaultLoader
last.
ESMLoader.load()
all other hooks need to run and a particular one dynamically needs be last in queue, yes, next() would facilitate that and done() would not. Needing to do that dynamically is a critical differentiator
Yes, you understood!
but it sounds like very much an edge-case
Is a mocking library an edge case for loaders? I don't think so. I'm guessing there will be a few of those.
Regarding needing the particular resolve hook's sibling load hook to be called first, both done() and next() could do that
How does that happen in the Done method? If the resolve
MUST be called last, that means that the loader should be the last in the chain, which means that the load
will also be last, no? Using the next
method, I can be the first in the chain, yet call next()
and transform the value I get from next()
, thus acting like I'm last in the chain, whereas my load()
can just not call next()
and thus be the first (and last!) in the chain.
At first read, it sounds like what you're describing meticulously strong-arms a process that would probably naturally result in what you want anyway.
Maybe I'm missing something, but how would it naturally occur?
If this is an edge-case, unless it's a very compelling edge-case...
As I said above, I believe mocking libraries are a compelling case.
For sure supporting mocking is compelling.
I'm thinking mocking would be used only during automated testing (if not, please let me know of the other scenario(s)). If so, I would expect the mocking loader to need only a load()
hook (allowing the normal resolves to happen as they would). If so, just put the mocking loader at the front of the queue:
"test": "NODE_OPTIONS='--loader mockLoader' …"
Then in mockLoader's load, read a static list of what to mock, check the url against that, and do on match (and opt-out on mismatch):
export async function load(input, url /* … */) {
const mock = mapOfMocks.get(url);
if (mock) return {
format: 'module',
source: mock,
};
// omitting a return = opt-out
}
If you wanted to bypass a mock, you could add some kind of query param flag to the specifier, like 'foo.mjs?noMock'
. You could similarly have an adhoc mocking opt-in with a query param (and the easiest would be some kind of location convention, like adjacent, but the query param could have a value with a path to the mock): 'foo.mjs?doMock'
and the mockLoader looks for a file of the same name like 'foo.mock.mjs'
:
export async function load(input, inputUrl, context, done, defaultLoader) {
const url = new URL(inputUrl);
const params = new URLSearchParams(url.search);
if (params.has('noMock')) return;
if (params.has('doMock')) {
const mockUrl = …;
const mock = defaultLoader(mockUrl);
if (mock) return {
format: 'module',
source: mock,
};
}
else { /* global mocks */ }
// omitting a return → opt-out
}
Perhaps don't call done()
in mockLoader because the mock file might be typescript or something that a subsequent loader would transform.
@JakobJingleheimer
The reason mocking loaders also need to hook the resolve
is for "cache busting": in any mocking library, you can mock an ESM one way, and then mock it another. To make this work, the resolve
adds a "generation" query parameter to the URL to enable it to load the "latest" change to the mock. But for this to work, it needs to be the last in the chain.
(in Quibble currently it's also used as a hacky way to figure out where the module file is (search for "dummyImportModuleToGetAtPath" in my blog post, but I expect this hack to go away once we have import.meta.resolve
)
Does jest's mocking loader need resolve
for an additional reason: to redirect imports into __mocks__
directories?
@JakobJingleheimer In your HTTPS loader example, what is input
and why would it being defined mean we would opt out?
export async function load(
input,
inputUrl,
context,
// done,
// defaultLoader,
) {
if (input) return; // opt-out
I’m thinking mocking would be used only during automated testing (if not, please let me know of the other scenario(s)).
I use mocking of network calls during development to control API responses without needing live backends in particular states. That could be implemented at a low level (mock/intercept the networking machinery itself) or by mocking the function where the network fetch occurs.
Application monitoring could also be considered another form of mocking. If you think of a use case like a live dashboard of an app’s traffic, getting the data for such a tool could be implemented by mocking/proxying lower level functions like core parts of Connect/Express/other framework or of Node.
@giltayar so actually I had read that blog post way back when! Having re-read it, now I remember the purpose of the "generation" query param.
To sum up the problem, it sounds like you're using the generation number in resolve
to ensure 2 things:
foo.mock
used by bar.test
and qux.test
has no cross-contamination (ex bar.test
does something to foo.mock
and that shouldn't bleed into qux.test
, like call counts)foo
is locally mocked in bar.test
and also in qux.test
and their local mocks are different (ex bar
's mock of foo
contains exports a
and b
, and qux
's mock of foo
has exports b
and c
)When running tests in parallel, bar.test
and qux.test
can be being processed at the same time (typically facilitated via multiple child processes); importantly, there is only 1 node process, and that contains the caches (so that is shared).
Have I got that right?
If so, why does your resolve need to be last? I think all that matters is that your query params exist in the final value of url
returned from the resolve chain.
Does jest's mocking loader need
resolve
for an additional reason: to redirect imports into__mocks__
directories?
@cspotcode yes for the same reason(s) Quibble does.
In your HTTPS loader example, what is
input
and why would it being defined mean we would opt out?
@GeoffreyBooth input
is the (interim) result of the previous loader, if any previous loader(s) provided anything yet (ex there could be 5 loaders that ran before HTTPS Loader, and they may have all opted out, in which case input
would be empty).
In the case of HTTPS Loader, if interim source already exists, that indicates there's nothing for HTTPS Loader to do. My example may have over-simplified slightly, since input
would be empty or an object (so it would actually be checking input?.source
). Perhaps input
should be initialised as an object with empty format
and source
properties (but that may make the node-land code a little more complicated).
I use mocking of network calls during development to control API responses without needing live backends in particular states. That could be implemented at a low level (mock/intercept the networking machinery itself) or by mocking the function where the network fetch occurs.
Ah, true. In that case, just ensure that mocker occurs ahead of HTTPS Loader (and any other remote loaders) in the queue 😉 I'm thinking that they would not need to conditionally change sequence.
Application monitoring could also be considered another form of mocking. If you think of a use case like a live dashboard of an app’s traffic, getting the data for such a tool could be implemented by mocking/proxying lower level functions like core parts of Connect/Express/other framework or of Node.
I think in that case, it would just need to be first in the queue (which they already need, citing something like "ensure Foo is the first require()
in your application").
The updates to ESMLoader.load() would also need to ensure individual properties are not inadvertently stomped (eg. loader4 returns only a source
property and omits format
, that probably shouldn't overwrite an existing format
from a previous loader, but returning source
and an empty format
would).
A module can have multiple, concurrent mocks: foo is locally mocked in bar.test and also in qux.test and their local mocks are different (ex bar's mock of foo contains exports a and b, and qux's mock of foo has exports b and c)
Nope. There can be only one specific mock per-process at a specific time. But a test could mock the module in one way at a certain time, and serially after that mock it in another way. And that is the purpose of the generations: to allow the test to change the mocks serially.
If so, why does your resolve need to be last?
It needs to be last because I don't really care how the module is resolved. I just want to add my query parameter to it. Hence, it needs to be the last.
I think all that matters is that your query params exist in the final value of url returned from the resolve chain.
Exactly. That's why it needs to be last! Or did I misunderstand?
Nope. There can be only one specific mock per-process at a specific time. But a test could mock the module in one way at a certain time, and serially after that mock it in another way. And that is the purpose of the generations: to allow the test to change the mocks serially.
I think not important for next
vs done
, buut: Possibly over-complicating things, might using the parentUrl be more robust and easier to troubleshoot (and also preclude collision)? Ex
file://…/foo.mjs?__quibble=…/bar.mjs
file://…/foo.mjs?__quibble=…/qux.mjs
I think all that matters is that your query params exist in the final value of url returned from the resolve chain.
Exactly. That's why it needs to be last! Or did I misunderstand?
Yes, I'm pretty sure you misunderstood 🙂 If all you need is your query param to be present in the final value, that can happen anywhere in the chain (as long as other loaders behave themselves). Think of it like an assembly-line, where each loader does its own part: I make and attach the headlights, you make and attach the upholstery; neither of us does that to the exclusion of the other (hence why calling done
would be exceptional).
might using the parentUrl be more robust
Won't work, because same file might be importing the module multiple times (usingawait import
), yet wanting different mocking for each import.
that can happen anywhere in the chain (as long as other loaders behave themselves)
- as long as other loaders behave themselves: I don't want to depend on them keeping existing query parameters (or hashes).
That sounds like something everybody would want, which is mutually exclusive / impossible (regardless of next
or done
). If another loader is misbehaving, fix it or don't use it. I would ensure there's sufficient documentation and examples to understand how to get loaders to play nice together 😉 None of the concepts are novel, so someone with experience integrating multiple pieces would likely already be familiar.
- I can't add query parameters to bare specifiers, so I first have to have somebody resolve it to a file, and then add the query parameters.
If quibble is before another loader that would supply that, I thiiink this is addressed by leveraging the defaultResolver
argument?
export async function resolve(specifier, …, defaultResolver) {
// …
const fileUrl = defaultResolver(specifier, …);
const url = new URL(fileUrl);
// …
}
That sounds like something everybody would want, which is mutually exclusive / impossible (regardless of next or done). If another loader is misbehaving, fix it or don't use it. I would ensure there's sufficient documentation and examples to understand how to get loaders to play nice together
The problem is that if the writer of the loader has not tested it with a loader that adds query parameters, they would not know that they need to preserve it. So we would have to document this. And this is precisely those things that people tend to ignore. A lot of the "Done" proposal's raison d'être is to ensure that loaders are NOT dependent on the behavior of one another, yet here we are enforcing something that if the writer of the loader broke, chaining would not work in some cases.
If quibble is before another loader that would supply that, I thiiink this is addressed by leveraging the defaultResolver argument?
This is what I am doing now. But I didn't know that in the "Done" proposal there is a defaultResolver
parameter. I would have thought that in any "chained loaders" proposal, the default would be somewhere in the chain, either first or last.
we are enforcing something that if the writer of the loader broke, chaining would not work in some cases.
I think that's just the nature of the beast for this. They can't be entirely independent. I think this is perhaps most easily noticeable in resolve
because it's simple and outputs ~1 small thing. The done
proposal's intention is indeed "principle of least knowledge", but that's mainly concerned with least knowledge of the tool (esm loader); ensuring one's loader doesn't clobber another loader's work is, I'm pretty sure, unavoidable (and would also exist with next
). We could add some diff detection and emit warnings ("hey, listen! Looks like you didn't preserve the previous loader's bits."), but I imagine those warnings could often be erroneous and thus very annoying.
But I didn't know that in the "Done" proposal there is a defaultResolver parameter. I would have thought that in any "chained loaders" proposal, the default would be somewhere in the chain, either first or last.
Ah, yes, sorry. This was discussed verbally in the meeting, and it was suggested to provide code examples for load
but not resolve
because resolve is simpler. I'll add both resolve
and load
code examples in the top/first post here (to keep it all together) and ping you when it's done (since the edit won't send you a notification). I should have time this afternoon.
@giltayar @GeoffreyBooth examples (at the top) updated. notably, input
→ interimResult
and inputUrl
→ resolvedUrl
, and I changed the sequence of arguments: default*
↔ done
since default* is more likely to be used (allowing done
to be omitted).
moved to the nodejs/loaders team repo
There are 2 leading approach proposals for ESM loaders and chaining them.
Similarities
Both approaches:
resolve()
: finding the source (equivalent to the current experimentalresolve()
); returns {Object}format?
: see esm → getFormaturl
: same as current (see esm → resolve)load()
: supplying the source (a combination of the current experimentalgetFormat()
,getSource()
, andtransformSource()
); returns {Object}format
: same as current (see esm → getFormat)source
: same as current (see esm → getSource)resolve
andload
) are chained:resolve()
s are executed (resolve1, resolve2, …resolveN)load()
s are executed (load1, load2, …loadN)Differences
Next()
This approach is originally detailed in #36396.
Hooks are called in reverse order (last first): a hook's 3rd argument would be a
next()
function, which is a reference to the previous loader hook. Ex there are 3 loaders:unpkg
,http-to-https
, andcache-buster
(cache-buster
is the final loader in the chain):cache-buster
invokeshttp-to-https
, which in turn invokesunpkg
(which itself invokes Node's default):cache-buster
←http-to-https
←unpkg
← Node's defaultThe user must actively connect the chain or it (likely) fails: If a hook does not call
next
, "the loader short-circuits the chain and no further loaders are called".Done()
This approach was also proposed in #36396 (in this comment).
The guiding principle of this approach is principal of least knowledge.
Hooks are called in the order they're declared/listed, and the return of the previous is fed as the input of the subsequent/next hook, and each hook is called automatically (unless short-circuited):
unpkg
→http-to-https
→cache-buster
(if none of the supplied loaders output valid values, node's default loader/hook is invoked, enabling a hook to potentially handle only part and avoid re-implementing the native functionality node already provides via the default hook).Hooks have a
done
argument, used in rare circumstances to short-circuit the chain.Additionally, this proposal includes a polymorphic return:
done(validValue)
validValue
as final value (skipping any remaining loaders)false
Examples
Resulting in
https-loader
being invoked first,mock-loader
second, etc, and node's internaldefaultLoader
last.For illustrative purposes, I've separated
resolve
andload
hooks into different code blocks, but they would actually appear in the same module IRL.Resolve hook chain
HTTPS Loader
```js const httpProtocols = new Set([ 'http:', 'https:', ]); /** * @param {Object} interimResult The result from the previous loader * (if any previous loader returned anything). * @param {string} [interimResult.format=''] * @param {string} [interimResult.url=''] * @param {string} context.originalSpecifier The original value of the import * specifier * @param {string?} context.parentUrl * @param {function} defaultResolver The built-in Node.js resolver (handles * built-in modules like `fs`, npm packages, etc) * @param {function(finalResult?)} done A short-circuit function to break the * resolve hook chain * @returns {false|{format?: string, url: string}?} If participating, the hook * resolves with a `url` and optionally a `format` */ export async function resolve( interimResult, // context, // defaultResolver, // done, ) { let url; try { url = new URL(interimResult.url); // there is a protocol and it's not one this loader supports: step aside. if (!httpProtocols.has(url.protocol)) return; } catch (err) { // specifier does not meet conditions for this loader; step aside. if (!determineWhetherShouldHandle(interimResult)) return; } return { url: '…', }; } ```Mock Loader
```js export async function resolve( interimResult, context, defaultResolver, // done, ) { let url; try { url = new URL(interimResult.url) } catch (err) { url = new URL(defaultResolver(interimResult /* , … */).url) } url.searchParams.set('__quibble', generation); return { url: url.toString(), }; } ```Load hook chain
HTTPS Loader
```js const contentTypeToFormat = new Map([ ['text/coffeescript', 'coffeescript'], ['application/node', 'commonjs'], ['application/vnd.node.node', 'commonjs'], ['application/javascript', 'javascript'], ['text/javascript', 'javascript'], ['application/json', 'json'], // … ]); /** * @param {Object} interimResult The result from the previous loader (if any * previous loader returned anything) * @param {string} [interimResult.format=''] Potentially a transient value. If * the resolve chain settled with a `format`, that is the initial value here. * @param {string|ArrayBufferView|TypedArray} [interimResult.source=''] * @param {Object} context * @param {Array} context.conditions * @param {string?} context.parentUrl * @param {string} context.resolvedUrl The module's resolved url (as * determined by the resolve hook chain). * @param {Function} defaultLoader The built-in Node.js loader (handles file * and data URLs). * @param {Function} done A terminating function to break the load hook chain; * done accepts a single argument, which is used for the final result of the * load hook chain. */ export async function load( interimResult, { resolvedUrl }, // defaultLoader, // done, ) { if (interimResult.source) return; // step aside (content already retrieved) const url = new URL(resolvedUrl); if (!httpProtocols.has(url.protocol)) return; // step aside const result = await new Promise((res, rej) => { get(resolvedUrl, (rsp) => { const format = contentTypeToFormat.get(rsp.headers['content-type']); let source = ''; rsp.on('data', (chunk) => source += chunk); rsp.on('end', () => res({ format, source, })); rsp.on('error', (err) => rej(err)); }); }); return result; } ```Mock Loader
```js export async function load( interimResult, { resolvedUrl }, defaultLoader, // done, ) { const isQuibbly = (new URL(resolvedUrl)).searchParams.get('__quibble'); if (!isQuibbly) return; const mock = defaultLoader(urlToMock); // or some runtime-supplied mock return { source: mock }; } ```CoffeeScript Loader
```js const exts = new Set([ '.coffee', '.coffee.md', '.litcoffee', ]); export async function load( interimResult, context, defaultLoader, // done, ) { if ( !!interimResult.format && interimResult.format !== 'coffeescript' ) return; // step aside const ext = extname(context.resolvedUrl); if (!exts.has(ext)) return; // step aside const rawSource = interimResult.source || defaultLoader( { format: 'coffeescript', // defaultLoader currently doesn't actually care }, context ).source; const transformedSource = coffee.compile(rawSource.toString(), { whateverOptionSpecifies: 'module' }); return { format: 'module', source: transformedSource, }; } ```Updates to
```js class ESMLoader { async load(resolvedUrl, moduleContext, resolvedFormat = '') { const context = { ...moduleContext, resolvedUrl, } let shortCircuited = false; // should we support calling done with no arg? let finalResult; let format = resolvedFormat; let source = ''; function done(result) { finalResult = result; shortCircuited = true; } for (let i = 0, count = this.loaders.length; i < count; i++) { const tmpResult = await loader( { format, source }, context, defaultLoader, done, ); if (shortCircuited) break; if (tmpResult == null) continue; // loader opted out if (tmpResult === false) { finalResult = { source: '' }; break; } if (tmpResult?.format != null) format = tmpResult.format; if (tmpResult?.source != null) source = tmpResult.source; } finalResult ??= interimResult; // various existing result checks and error throwing } } ```ESMLoader.load()
Concerns Raised
Next()
next
function does not behave as many current, well-known implementations behave (ex javascript's native generator'snext
is the inverse order to this's, and not calling ExpressJS's route-handler'snext
does not break the chain).next
is effectively required (not callingnext
will likely lead to adverse/undesirable behaviour, and in many cases, break in very confusing ways).Done()
next
approach also?)~ After chatting with @bengl, it seems like this is not an issue as V8 exposes what they need.