Module cache keying semantics

guybedford commented 5 years ago

Currently the module cache is typically keyed uniquely by URL.

With this proposal the module identity now seems to include both the URL and the attributes.

Would the expectation be that the module cache keying would be extended to support the attribute identity?

Or alternatively will the URL remain the primary keying with some reconciliation approaches in place? How would those approaches avoid causing indeterminism based on load ordering?

devsnek commented 5 years ago

To be more specific, modules in the spec are keyed as (referrer, specifier) pairs, so the exact problem is a single file with two imports of the same specifier with different attributes.

littledan commented 5 years ago

I was imagining that the module attributes would not be part of the key. That is, the same module specifier and referrer would always point to the same module record, even when imported twice with different parameters. However, a subsequent load with different module attributes may lead to an exception being thrown rather than the module returned. Here's some examples of possible semantics (each of which could be debated further) which are all consistent with the idea that the module attributes are not part of the key, but the host may keep them around:

With type on the web, if a module is imported once, the type must either be provided or missing (JS), and the subsequent loads must pass the same type option (otherwise the check would've failed).
If we have fetch options passed through module attributes, subsequent loads could either repeat the fetch options of the initial load, or omit them, but not pass additional or different options.
If we have signature-based SRI passed through module attributes, any site for importing the module may pass in a signature, and that that would be checked against the module contents for that import site (that is, SRI errors would not be cached, in this idea).

This sort of interpretation rules out some use cases that @xtuc mentioned about module attributes determining the processing mode, when used in a way that a module could be imported multiple times in different processing modes.

bmeck commented 4 years ago

It seems like races could occur if first to import wins:

try {
  // try to load as JS, this could make all attempts to load JSON fail
  await import('foo');
} catch (e) {
  // try to load as JSON? maybe thats how other people are using this thing?
  const {default: foo} = await import('foo', {type: 'json'});
  // clobber stuff
  Object.defineProperty(foo, 'bar', {
    get() {
       // fun stuff like arguments.caller / Error.prepareStackTrace / etc. go here
    },
    set(v) {
       // more clobbering/calling here
    }
  });
}

I have serious reservations about the viability of keeping everything in sync across all call sites and if throwing on mismatch is desirable. It seems like throw on mismatch could lead to just iterating all the possible values in order and doing things as if it were some kind of odd overloading technique.

littledan commented 4 years ago

@bmeck This is an interesting example. Sounds like we probably shouldn't cache failure due to module type check mismatch.

hax commented 4 years ago

So allow retry when type mismatch? But current import('foo') will not allow retry if type is not js mimetype.

jkrems commented 4 years ago

In terms of use cases, this points to two things being fundamentally at odds (based on some not-so-theoretical host that acts like a browser):

Module attributes are optional, allowing for incremental adoption of an attribute (assertion-style attributes). This implies that module identity doesn't depend on attributes.
Module attributes change the interpretation of the URL, e.g. by adding or modifying headers or by changing how the fetched resource is evaluated. This implies that module identity does depend on attributes.

It would feel pretty unfortunate if those two semantics would mix. Not only would tools and users have to have perfect knowledge on the cache semantics of each individual attribute (as opposed to module attributes in general), it would also affect the ability to support unknown attributes. A host couldn't just ignore an unknown attribute because it would have ambiguous cache key semantics.

And unless new module attributes can only ever apply to module types that have never been supported before, (1) seems pretty important. Otherwise it would require global coordination across files to make sure that the same target is never requested with different attributes.

P.S.: Global coordination because this is speculating about likely behavior in a browser-like host where there's a global/realm-wide module map that has some caching semantics. From a JS spec perspective it would only be per-importing-file coordination. But I don't think browsers would pick global caching semantics that are incompatible with the per-file ones. So - effectively it's global cache-by-URL that most users would observe.

Jack-Works commented 4 years ago

We should split module attributes that will and won't change the identity of the module, it's better on the syntax level.

For example:

These two modules are different

import mod from '/mod' with { type: 'css' }
import mod1 from '/mod' with { }

These two modules are the same

import mod from '/mod' with {} and { integrity: 'hash-of-this-file' }
//                          ~~     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
import mod1 from '/mod' with {} and { cache: '1year' }

In the first attribute object {}, any difference in it cause them to be two different modules, but in the second attribute object, module with different attribute object are treated as the same module.

But this is not enough, we need a syntax for the developer to specify the following orthogonal behavior:

Identity: If this attribute changes, should it be the same module? (For type, no. For integrity, yes)
Ignorable: Is the attribute ignorable if the runtime doesn't know it's semantics?
...: any more?

littledan commented 4 years ago

@Jack-Works I see type as being in the same category as integrity: they are both checks. type always checks a type which is present in some other way, e.g., the MIME type or file extension. (That said, there are some significant arguments against having integrity in particular.) These would not be part of the cache key (failure vs non-failure would differ based on what's passed, but they couldn't have multiple distinct success values.)

There's another category which would change the interpretation of a module. For example, parameters to JSON.parse, or interpreting the path as a string that's default-exported rather than what it would be otherwise. These sorts of module attributes would need to be part of the cache key.

Jack-Works commented 4 years ago

@littledan I don't think type is a simple check. The browser can send different sec-fetch-dest header based on the value the type therefore server can return different formats based on the sec-fetch-dest header. Therefore, files with different type must be treated as different modules. But if I import a module with integrity and import it again without integrity, I expect it to be the same module

littledan commented 4 years ago

@Jack-Works Let's discuss this further in https://github.com/tc39/proposal-module-attributes/issues/24 . We'll need a specification for what browsers do exactly before Stage 3, and web specs like fetch tend to go into that level of detail.

Jack-Works commented 4 years ago

@littledan oh, so type is just my example, I really want to emphasize is that we should have a signal to tell the runtime if the attribute is ignorable or will affect the identity of the module even they don't know what the attribute is.

jkrems commented 4 years ago

I like that separation! But the and feels awkward because it favors cache-busting attributes over ones that maintain a single module per URL. Maybe something like this would express the same:

// Attributes that only assert properties on the resource but don't affect how it's fetched:
import mod from '/mod' assert { integrity: 'hash-of-this-file' }
import mod from '/mod' expect { integrity: 'hash-of-this-file' }
import mod from '/mod' assert { type: 'json' }

// Attributes that may affected how the module is being fetched:
import mod from '/mod' with { credentials: 'include' }

// Combined
import mod from '/mod' with { credentials: 'include' } assert { integrity: 'hash-of-this-file' }

Therefore, files with different type must be treated as different modules.

I don't think that's necessarily true. One big downside is that type could never be optional which is a bit unfortunate. Having a non-optional type attribute means there's a big pressure to only ever load modules from JavaScript modules (to make things easier to use / less noisy). So even endpoints that could've been simple data files would be returning executable scripts instead.

littledan commented 4 years ago

I continue to have the understanding that type would be required in the web, and non-web environments could decide whether to require it or make it optional.

hax commented 4 years ago

I continue to have the understanding that type would be required in the web, and non-web environments could decide whether to require it or make it optional.

So it would make the modules env-specific just because json/wasm modules?

littledan commented 4 years ago

@hax This proposal does not prohibit environments from making various kinds of environment-specific modules, including JSON modules which don't require with type: "json". Environments may choose to either stick to the cross-environment set or support more forms.

hax commented 4 years ago

@littledan I think u mean each env could choose whether with type is enforced, but I really hope we could keep them uniform to avoid fragmentation of the ecosystem. 😥

Jack-Works commented 4 years ago

Let the developer decide if their attribute must be enforced. By this way, we can avoid fragmentation of the ecosystem

littledan commented 4 years ago

@hax I hope so too. That's why this proposal requires that with type: "json" is supported. But, as the semantics of modules in general is unspecified, I don't see how we could prohibit them from making other choices for that general case.

hax commented 4 years ago

Let the developer decide if their attribute must be enforced.

How ? The reason why we need with type is browsers think json module will be insecure without that. (But I'm still not convinced there is no other way to solve that security problem :-)

hax commented 4 years ago

I don't see how we could prohibit them from making other choices for that general case.

@littledan I think it's the problem, we face such problems and up to now all possible solution I saw are just add another layer of complexity for developers. The original requirement as I understand is just JSON module, why developers need to pay all other complexity for simple loading JSON problem, I think most will just choose continue using let data = await fetchJSON('./x.json').

(I understand there are also wasm/css/html modules may need it, but first wasm module do not suffer similar security issue, css/html module are still in a long way and it's too early to say whether they will eventually available)

littledan commented 4 years ago

@Jack-Works I don't see how we could let the developer decide, except in the context of a Compartment API which defines module semantics. Outside of Compartments, the environment provides module semantics.

@hax Developers who do that won't be able to use JSON modules on the Web. So I'm not convinced that most developers will choose this, given that many JS developers write code to run on the web.

littledan commented 4 years ago

Based on the feedback in https://github.com/openjs-foundation/standards/issues/91 , I think there's some more to discuss with respect to the details of caching and treatment of unrecognized and host-defined attributes. The current spec draft provides a concrete starting point which various people have raised concerns about being too permissive for hosts; before Stage 3, it seems like we may want to put more restrictions on hosts to ensure expected behavior.

littledan commented 4 years ago

In the end, we decided to land #66 for Stage 2, which restricts module attributes to not be part of the cache key. Does that resolves the concerns raised in this issue?

xtuc commented 4 years ago

I think we can close this issue now. We can change the caching semantics based on the follow-up proposals.

littledan commented 4 years ago

@domenic has raised questions about the caching model in https://github.com/whatwg/html/pull/5658#issuecomment-646788942 and https://github.com/whatwg/html/pull/5658#issuecomment-646788942 . Although we have TC39 consensus recorded on going to Stage 2 with a restriction to "check"-style attributes, the plan was to investigate host interactions between Stage 2 and 3, and revise host hook invariants based on that investigation, so this conversation seems in scope.

As I wrote about the current status,

To clarify, the import attributes proposal isn't about what you're importing but whether the import meets certain conditions. For this reason, we switched the token introducing this form from with to if and are considering renaming the proposal to import conditions. Modifying what is being imported (which would logically be part of the cache key) would be a separate proposal, if sufficient use cases are identified. We're working on clarifying this distinction in our documentation and hope to present on it in the upcoming TC39 meeting.

I'm not sure whether this meets technical requirements from HTML with respect to module cache keying. So, I'm reopening this thread to discuss further.

nicolo-ribaudo commented 1 year ago

The proposal has been updated so that import assertions/attributes can affect how a module is loaded, in response to feedback from HTML.

As such, the cache is now keyed by (referrer, specifier, attributes) and the recommendation that it should only be keyed by (referrer, specifier) has been removed.

tc39 / proposal-import-attributes

Module cache keying semantics #30