Import assertions with loaders

aduh95 commented 2 years ago

Here's a draft of what solution I can think of regarding relations between loaders and import assertions.

Relation of ESMLoader class with import assertions:

Before `resolve()`	`resolve()`	Between `resolve()` and `load()`	`load()`	After `load()`
Check if `importAssertions.type` value is supported	nothing	use `importAssertions.type` to check for race conditions	nothing	check if `finalFormat` is compatible with `importAssertions.type`
nothing	pass `importAssertions` to `resolve()`	check if `importAssertions.type` value is supported, and use it to check for race conditions	nothing	check if `finalFormat` is compatible with `importAssertions.type`
nothing	pass `importAssertions` to `resolve()`	disallow race conditions?[^1]	nothing	check if `finalFormat` is compatible with `importAssertions.type`
nothing	pass `importAssertions` to `resolve()`	one of the above	pass `importAssertions` to `load()`	check if `importAssertions.type` value is supported and compatible with `finalFormat`
nothing	pass `importAssertions` to `resolve()`	one of the above	pass `importAssertions` to `load()` and let `load()` do the checks	nothing
nothing	nothing	use `importAssertions.type` to check for race conditions	pass `importAssertions` to `load()` and let `load()` do the checks	nothing
nothing	nothing	nothing	pass `importAssertions` to `load()` and let `load()` do the checks	nothing

[^1]: if the resolver returns an url that's already in the cache and the returned importAssertion.type has a value different from the one in the cache, throw a TypeError.

The first one is the approach on my first Unflag import assertions PR, drawbacks of this approach have been discussed.
The second seems really appealing to me, it would allow folks to write hooks to support assertion-less JSON, and also support import assertions that are not supported by Node.js.
Same thing except the loader throws if a URL is loaded with different assertions. It puts the burden of handling race conditions to the resolve hook. Not sure this is actually doable and/or desirable.
pass the output importAssertions of resolve() to load() context. This adds complexity, not sure it's worth it.
Same but now the loader hook is in charge of doing the verification, so users can opt-out. That seems... dangerous? but maybe it's actually OK?
Same but import assertions are not passed to resolve().

My (new) favourite is 2.: it seems to me like the perfect balance between simplicity and letting users do powerful things (e.g.: it would be quite easy to implement a loader that supports assertionless imports). I have a PR almost ready implementing this.

Anyway, we can discuss that on next week meeting since Goeffrey invited me, but happy to hear thoughts before that if people want to share them.

GeoffreyBooth commented 2 years ago

Is this table for the current https://github.com/nodejs/node/pull/40250 PR, or a potential future where the assertions are optional?

I don’t think we should be doing much, if anything, between resolve and load. As a loader author, I would expect the return value of resolve to be passed into load without much happening in between.

I’m going to take a stab at a thinking-out-load proposal; feel free to point out its flaws or suggest a better alternative. Take as an example a YAML loader, to make possible code like this:

import data from './data.yaml' assert { type: 'yaml' }

For this case, the YAML loader’s load hook would be where we assert that the data is actually YAML before converting it to JSON and returning { source: convertedData, format: 'json' }, and then the json translator would take it from there. Node would not do any validation of the assertion; the validation would have to happen in defaultLoad, which in the case of a YAML loader would not have been called. In order for such a user loader to not be unnecessarily duplicative of Node’s internals, we would also need to make progress on https://github.com/nodejs/loaders/issues/26#issuecomment-918413522, making available some utility functions for some of the logic that Node does within defaultResolve and defaultLoad, so that this user loader could defer to existing Node logic for loading a file from disk but without needing to also do any assertion validation that defaultLoad would do.

So anyway, in this model nothing happens before resolve or between resolve and load, or after load. The validation is done in defaultLoad, if a user loader calls it or if no user loaders are present.

I think loader authors need the assertions info in both hooks, because resolve might want to resolve to a different URL based on the assertion, and in load a loader author might want to do a custom validation, which they would need the assertion for (such as this YAML example). If there was some way for authors to pass arbitrary data from resolve into load, that would alleviate the need for us to orchestrate this.

The module couldn’t be added to the cache until after load returns with the module’s source, and since the validation would happen during load we would never have cached modules that would have failed the validation check. This would mean, I think, that we should never have race conditions.

Also @aduh95 you’re always welcome to join the loaders meetings, you don’t need an invitation. Subscribe to this repo and you should see an issue created a few days before each meeting with the meeting details.

ljharb commented 2 years ago

The language spec actively prohibits a different module to be loaded based on the assertion - in other words, resolve shouldn't even get the assertion information. The assertion only determines if the module will load, or will fail.

aduh95 commented 2 years ago

Is this table for the current nodejs/node#40250 PR, or a potential future where the assertions are optional?

It could be either, considering that no one is supporting implementation nodejs/node#40250, I was thinking about re-using it.

The module couldn’t be added to the cache until after load returns with the module’s source, and since the validation would happen during load we would never have cached modules that would have failed the validation check. This would mean, I think, that we should never have race conditions.

Maybe we are talking about a different cache: if a user imports twice the same specifier, and resolve() returns the same url for both, I don't think we want to call load() twice, I'd say we want use the same module job – even though load() hasn't returned the final source yet.

The language spec actively prohibits a different module to be loaded based on the assertion - in other words, resolve shouldn't even get the assertion information. The assertion only determines if the module will load, or will fail.

To be precise, the spec proposal says moduleRequest.[[Assertions]] must not influence the interpretation of the module or the module specifier; instead, it may be used to determine whether the algorithm completes normally or with an abrupt completion. The question is if we can let user provided loaders violate the spec if they dare. I've added a line to the table for scenario on which resolve() doesn't get the import assertion.

targos commented 2 years ago

What about adding a new loader hook, which would only return whether the assertion passes or not?

targos commented 2 years ago

@GeoffreyBooth

About:

import data from './data.yaml' assert { type: 'yaml' }

From my point of view, this example is invalid. I think that assertions should be about what the specifier is translated to, not what it's being translated from.

If the code is run with a YAML loader that translates YAML to JSON, then the assertion in the code should be { type: 'json' }, and the loader doesn't need the assertion to behave correctly.

aduh95 commented 2 years ago

What about adding a new loader hook, which would only return whether the assertion passes or not?

That's definitely worth considering. Also maybe something that would correspond to the HostGetSupportedImportAssertions abstract operation in the spec (I could imagine load() hook returning this list).

From my point of view, this example is invalid. I think that assertions should be about what the specifier is translated to, not what it's being translated from.

I think the idea is that if a TC39 proposal for YAML modules comes up, someone could use a loader hook to emulate support for it in Node.js.

GeoffreyBooth commented 2 years ago

What about adding a new loader hook, which would only return whether the assertion passes or not?

We can’t really add more loader hooks because of chaining. We just went through a monthslong redesign to collapse four hooks to two to make them chainable. However, making utility functions that split out these smaller pieces that are within the larger hooks would be doable, and should provide many of the same benefits. So if you’re writing a loader that only wants to override the assertion validation, for example to make it always pass, you could create a load hook that pulls in the utility functions for ‘load this file from disk’ and ‘determine the format of this file’ and then write your own code (or lack thereof) for the validation.

GeoffreyBooth commented 2 years ago

I think the idea is that if a TC39 proposal for YAML modules comes up, someone could use a loader hook to emulate support for it in Node.js.

Yeah. For example, according to https://web.dev/css-module-scripts/, Chrome already supports this:

import sheet from './styles.css' assert { type: 'css' }

I don’t know what a loader would convert this to; ESM JavaScript that is just export default 'the css contents as a string'? So I thought YAML would be a better example. But the YAML example could also be converted by our loader into export default {"... where the export is the JSON dropped into JavaScript, and so then the loader returns { source: jsString, format: 'module' }. In that case, if the assertion needs to match the return value from the loader then the original import would have needed to lack an assertion, because the loader is outputting JavaScript rather than JSON. This feels wrong to me, as the user shouldn’t need to change the original source code based on which custom loader they’re using.

troywweber7 commented 1 year ago

Importing YAML directly into a node.js program seems like it could be pretty useful. How far away are we from such a reality?

GeoffreyBooth commented 1 year ago

Importing YAML directly into a node.js program seems like it could be pretty useful. How far away are we from such a reality?

It’s unlikely to be ever supported natively, unless spec committees add support for it like they have for JSON; but you can write a loader to add custom support for it today.

troywweber7 commented 1 year ago

you can write a loader to add custom support for it today

What is the best resource to get me started if I wanted to write something like that? I found this article online but I'm wondering if there is a more proper document for me to refer to when trying to write a loader?

I'm mostly curious about this for my own edification. I doubt I'd be able to write anything official, but it could be helpful internally or for a side project.

Thanks for your quick feedback and your time!

UPDATE: official documentation is probably where I should start...

GeoffreyBooth commented 1 year ago

UPDATE: official documentation is probably where I should start…

Yes, the official docs should be all you need. The “transpiler loader” example is the pattern you’d want to follow; you’re essentially transpiling YAML into JSON.

nodejs / loaders

Import assertions with loaders #39