Closed guybedford closed 1 year ago
To be more specific, modules in the spec are keyed as (referrer, specifier)
pairs, so the exact problem is a single file with two imports of the same specifier with different attributes.
I was imagining that the module attributes would not be part of the key. That is, the same module specifier and referrer would always point to the same module record, even when imported twice with different parameters. However, a subsequent load with different module attributes may lead to an exception being thrown rather than the module returned. Here's some examples of possible semantics (each of which could be debated further) which are all consistent with the idea that the module attributes are not part of the key, but the host may keep them around:
type
on the web, if a module is imported once, the type must either be provided or missing (JS), and the subsequent loads must pass the same type option (otherwise the check would've failed).This sort of interpretation rules out some use cases that @xtuc mentioned about module attributes determining the processing mode, when used in a way that a module could be imported multiple times in different processing modes.
It seems like races could occur if first to import wins:
try {
// try to load as JS, this could make all attempts to load JSON fail
await import('foo');
} catch (e) {
// try to load as JSON? maybe thats how other people are using this thing?
const {default: foo} = await import('foo', {type: 'json'});
// clobber stuff
Object.defineProperty(foo, 'bar', {
get() {
// fun stuff like arguments.caller / Error.prepareStackTrace / etc. go here
},
set(v) {
// more clobbering/calling here
}
});
}
I have serious reservations about the viability of keeping everything in sync across all call sites and if throwing on mismatch is desirable. It seems like throw on mismatch could lead to just iterating all the possible values in order and doing things as if it were some kind of odd overloading technique.
@bmeck This is an interesting example. Sounds like we probably shouldn't cache failure due to module type check mismatch.
So allow retry when type mismatch? But current import('foo')
will not allow retry if type is not js mimetype.
In terms of use cases, this points to two things being fundamentally at odds (based on some not-so-theoretical host that acts like a browser):
It would feel pretty unfortunate if those two semantics would mix. Not only would tools and users have to have perfect knowledge on the cache semantics of each individual attribute (as opposed to module attributes in general), it would also affect the ability to support unknown attributes. A host couldn't just ignore an unknown attribute because it would have ambiguous cache key semantics.
And unless new module attributes can only ever apply to module types that have never been supported before, (1) seems pretty important. Otherwise it would require global coordination across files to make sure that the same target is never requested with different attributes.
P.S.: Global coordination because this is speculating about likely behavior in a browser-like host where there's a global/realm-wide module map that has some caching semantics. From a JS spec perspective it would only be per-importing-file coordination. But I don't think browsers would pick global caching semantics that are incompatible with the per-file ones. So - effectively it's global cache-by-URL that most users would observe.
We should split module attributes that will and won't change the identity of the module, it's better on the syntax level.
For example:
These two modules are different
import mod from '/mod' with { type: 'css' }
import mod1 from '/mod' with { }
These two modules are the same
import mod from '/mod' with {} and { integrity: 'hash-of-this-file' }
// ~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
import mod1 from '/mod' with {} and { cache: '1year' }
In the first attribute object {}
, any difference in it cause them to be two different modules, but in the second attribute object, module with different attribute object are treated as the same module.
But this is not enough, we need a syntax for the developer to specify the following orthogonal behavior:
type
, no. For integrity
, yes)@Jack-Works I see type
as being in the same category as integrity
: they are both checks. type
always checks a type which is present in some other way, e.g., the MIME type or file extension. (That said, there are some significant arguments against having integrity
in particular.) These would not be part of the cache key (failure vs non-failure would differ based on what's passed, but they couldn't have multiple distinct success values.)
There's another category which would change the interpretation of a module. For example, parameters to JSON.parse, or interpreting the path as a string that's default-exported rather than what it would be otherwise. These sorts of module attributes would need to be part of the cache key.
@littledan I don't think type
is a simple check. The browser can send different sec-fetch-dest
header based on the value the type
therefore server can return different formats based on the sec-fetch-dest
header.
Therefore, files with different type
must be treated as different modules. But if I import a module with integrity
and import it again without integrity
, I expect it to be the same module
@Jack-Works Let's discuss this further in https://github.com/tc39/proposal-module-attributes/issues/24 . We'll need a specification for what browsers do exactly before Stage 3, and web specs like fetch tend to go into that level of detail.
@littledan oh, so type
is just my example, I really want to emphasize is that we should have a signal to tell the runtime if the attribute is ignorable or will affect the identity of the module even they don't know what the attribute is.
I like that separation! But the and
feels awkward because it favors cache-busting attributes over ones that maintain a single module per URL. Maybe something like this would express the same:
// Attributes that only assert properties on the resource but don't affect how it's fetched:
import mod from '/mod' assert { integrity: 'hash-of-this-file' }
import mod from '/mod' expect { integrity: 'hash-of-this-file' }
import mod from '/mod' assert { type: 'json' }
// Attributes that may affected how the module is being fetched:
import mod from '/mod' with { credentials: 'include' }
// Combined
import mod from '/mod' with { credentials: 'include' } assert { integrity: 'hash-of-this-file' }
Therefore, files with different
type
must be treated as different modules.
I don't think that's necessarily true. One big downside is that type
could never be optional which is a bit unfortunate. Having a non-optional type
attribute means there's a big pressure to only ever load modules from JavaScript modules (to make things easier to use / less noisy). So even endpoints that could've been simple data files would be returning executable scripts instead.
I continue to have the understanding that type
would be required in the web, and non-web environments could decide whether to require it or make it optional.
I continue to have the understanding that
type
would be required in the web, and non-web environments could decide whether to require it or make it optional.
So it would make the modules env-specific just because json/wasm modules?
@hax This proposal does not prohibit environments from making various kinds of environment-specific modules, including JSON modules which don't require with type: "json"
. Environments may choose to either stick to the cross-environment set or support more forms.
@littledan I think u mean each env could choose whether with type
is enforced, but I really hope we could keep them uniform to avoid fragmentation of the ecosystem. 😥
Let the developer decide if their attribute must be enforced. By this way, we can avoid fragmentation of the ecosystem
@hax I hope so too. That's why this proposal requires that with type: "json"
is supported. But, as the semantics of modules in general is unspecified, I don't see how we could prohibit them from making other choices for that general case.
Let the developer decide if their attribute must be enforced.
How ? The reason why we need with type
is browsers think json module will be insecure without that. (But I'm still not convinced there is no other way to solve that security problem :-)
I don't see how we could prohibit them from making other choices for that general case.
@littledan I think it's the problem, we face such problems and up to now all possible solution I saw are just add another layer of complexity for developers. The original requirement as I understand is just JSON module, why developers need to pay all other complexity for simple loading JSON problem, I think most will just choose continue using let data = await fetchJSON('./x.json')
.
(I understand there are also wasm/css/html modules may need it, but first wasm module do not suffer similar security issue, css/html module are still in a long way and it's too early to say whether they will eventually available)
@Jack-Works I don't see how we could let the developer decide, except in the context of a Compartment API which defines module semantics. Outside of Compartments, the environment provides module semantics.
@hax Developers who do that won't be able to use JSON modules on the Web. So I'm not convinced that most developers will choose this, given that many JS developers write code to run on the web.
Based on the feedback in https://github.com/openjs-foundation/standards/issues/91 , I think there's some more to discuss with respect to the details of caching and treatment of unrecognized and host-defined attributes. The current spec draft provides a concrete starting point which various people have raised concerns about being too permissive for hosts; before Stage 3, it seems like we may want to put more restrictions on hosts to ensure expected behavior.
In the end, we decided to land #66 for Stage 2, which restricts module attributes to not be part of the cache key. Does that resolves the concerns raised in this issue?
I think we can close this issue now. We can change the caching semantics based on the follow-up proposals.
@domenic has raised questions about the caching model in https://github.com/whatwg/html/pull/5658#issuecomment-646788942 and https://github.com/whatwg/html/pull/5658#issuecomment-646788942 . Although we have TC39 consensus recorded on going to Stage 2 with a restriction to "check"-style attributes, the plan was to investigate host interactions between Stage 2 and 3, and revise host hook invariants based on that investigation, so this conversation seems in scope.
As I wrote about the current status,
To clarify, the import attributes proposal isn't about what you're importing but whether the import meets certain conditions. For this reason, we switched the token introducing this form from
with
toif
and are considering renaming the proposal to import conditions. Modifying what is being imported (which would logically be part of the cache key) would be a separate proposal, if sufficient use cases are identified. We're working on clarifying this distinction in our documentation and hope to present on it in the upcoming TC39 meeting.
I'm not sure whether this meets technical requirements from HTML with respect to module cache keying. So, I'm reopening this thread to discuss further.
The proposal has been updated so that import assertions/attributes can affect how a module is loaded, in response to feedback from HTML.
As such, the cache is now keyed by (referrer, specifier, attributes)
and the recommendation that it should only be keyed by (referrer, specifier)
has been removed.
Currently the module cache is typically keyed uniquely by URL.
With this proposal the module identity now seems to include both the URL and the attributes.
Would the expectation be that the module cache keying would be extended to support the attribute identity?
Or alternatively will the URL remain the primary keying with some reconciliation approaches in place? How would those approaches avoid causing indeterminism based on load ordering?