tc39 / proposal-import-attributes

Proposal for syntax to import ES modules with assertions
https://tc39.es/proposal-import-attributes/
Apache License 2.0
569 stars 32 forks source link

Add `type: "any"` #116

Open weswigham opened 2 years ago

weswigham commented 2 years ago

Right now, the WHATWG HTML spec makes assertionless imports imply a JS type (that can't be otherwise uttered yet), while json imports use a "json" type, and css imports use a "css" type - there's no type to indicate that you don't care what the type is and fully trust the webserver (or runtime) to tell you what mime to load as (and are OK with the possibly-executable result). A type: "any" would be useful - it would let you import something that at some points in time is a javascript file, and others, a json document or wasm file. It'd also let you write an arbitrary importThing function that can actually import anything, rather than just javascript, eg

function importThing(url) {
  return import(url, {assert: { type: "any" }});
}

right now, if you want to write a function that lets you import arbitrary resources, you need to know the type in advance and pass it along as well, which may not always be the case (and while on the web you could prefetch the resource and then set the type based on the mime received, that'd be inefficient).

Some discussions over on the node repo made me realize that a standardized type indicating that you explicitly didn't actually care about the type might be useful for all runtimes, and not just a useful behavior for node.

GeoffreyBooth commented 2 years ago

This would work as the default type, i.e. if it replaced type: 'javascript' in the HTML spec, so a lack of a type implies type: 'any'. I think that would be a positive change, especially if the HTML spec gets updated accordingly.

I don’t think it can work in the way I assume you want, where code like import(urlOfJsonOrJavaScript, { assert: { type: 'any' } }) would become valid. Both the Node and Blink implementations of the module cache include the type as part of the cache key, and the type being part of the key is required by the HTML spec. I opened #114 to suggest making it part of this spec too. This “only one valid type per resolved URL” restriction prevents some undesirable footguns, like this:

const url = 'data:application/json,{}'

const { default: module1 } = await import(url, { assert: { type: 'json' } })
const { default: module2 } = await import(url, { assert: { type: 'any' } })

module1 === module2 // True or false? Are these separate modules or the same?

module1.foo = 3
module2.foo === 3 // True or false? Do they have separate internal states?

In the current implementations, where the type is part of the cache key, the two modules would be separate and wouldn’t share state (so false and false above), which is variation of the “dual package hazard” from the CommonJS/ESM implementation. If you want to restrict the type from being part of the cache key, you can propose that, but that would break the HTML spec; and when I initially tried to implement import assertions for Node I attempted just that but gave up because doing so introduced a lot of complexity. (I’m sure the HTML spec authors had good reasons for making the same choice.)

ljharb commented 2 years ago

The spec already ensures that module1 and module2 are conceptually the same, even if they have different identity. Module namespace objects are also frozen, so there is no state to track, so your "footgun" example simply isn't one.

If they maintain internal state and have different identity, then yes, they have separate internal states; this is a feature not a bug.

weswigham commented 2 years ago

In the current implementations, where the type is part of the cache key

If you want to restrict the type from being part of the cache key, you can propose that, but that would break the HTML spec;

I think that's mostly an artifact of fusing two layers of caching into one while the behavior is still simple enough to do so. There's two caching concerns for the web - the request layer and the specifier resolution layer. As long as they can know the (one) allowable format for a URL upfront, they can speculatively fuse those two layers of caching into a single one and simplify their implementation - however I see no reason why that behavior needs to be locked down. (Which is why for js-at-large, it isn't) Assertions being part of the resolution cache is part and parcel - assertions can change the error you get from module resolution for a given specifier, after all. Assertions being part of the request cache, though, isn't something that would seem to be strictly required since the format data is actually coming from the MIME (for browsers) or extensions (for node), and not from the assertion.

In any case, I imagine if the import assertions proposal changes in some way, the html spec on top of it would evolve to match - it's a bit of an experiment on top of an experiment. I do think a type: any would be very useful in practice, since as of right now, import assertions create some rather cumbersome patterns for code which ideally wouldn't care about them.

jridgewell commented 2 years ago

As a process, I don't believe this is the correct repo to raise this issue on. Import assertions machinery in ecma262 doesn't actually specify any of the types that my be imported, we only provide the ability for Hosts to specify what they handle. You'd need to convince w3c or whatwg (I'm not sure which actually owns this) that an "any" type should be valid.

My personal opinion is that you won't be able to convince the browser security folks that this is acceptable. If we wanted to allow easy exploits, we wouldn't have spent time on this proposal and just kept the old web imports behavior.

ljharb commented 2 years ago

It's only an exploit if you have a security property that rests on the thing you're importing not executing code - which most people importing JSON (or JS, or wasm) don't, because they're trying to execute code.

guybedford commented 2 years ago

While most assertion types fall outside of this specification, if there is one assertion type this spec must clearly define it is the top-level privilege name. I think "any" would make a lot more sense than "javascript" since JS isn't the only execution primitive in browsers. Alternative names for "any" could certainly be bikeshed further.

There exists no escalation above "javascript" in browsers so calling it "any" is clearly not a security change. It's really important to stress that here to avoid security FUD.

bmeck commented 2 years ago

We talked a long time about type not being the best choice for declaring arbitrary capabilities in TC39 meetings, I have no direct objection to "any" except it causes implementation complexity like racing and cache key consensus mechanisms for things like Compartments/Node's --loaders. I just don't want to conflate the 2 very different purposes.

GeoffreyBooth commented 2 years ago

Yes, I guess I should have written above that while it would be probably a good idea to rename HTML’s default type: 'javascript', changing its functionality would probably get strong pushback. As in, just as the HTML spec currently disallows import(jsonUrl, { assert: { type: 'javascript' } }), it would likely need to remain impossible if 'javascript' were renamed. So a name that doesn’t imply disabling type validation would be a better choice.

guybedford commented 2 years ago

Perhaps in line with @GeoffreyBooth's suggestion, "type": "executable" may make more sense than a "type": "any".

Would anyone have objections to specifying something like a "type": "executable" in this specification itself as a way to define the high level execution capability?

ljharb commented 2 years ago

The word "executable" isn't really that applicable; CSS and HTML "execute", but sure, a capabilities-based term sounds fine.

However, it still seems important to make it possible to import anything whether it requires an assertion or not.

Ms2ger commented 2 years ago

I'm not quite clear on the goals here. Are we trying to expose a way to get back the pre-import-assertions behaviour, where

function importThing(url) {
  return import(url, {assert: { type: "any" }});
}

will support loading any supported module, keyed on the MIME type? That is, if url has a "text/json" MIME type, it will be parsed as JSON, if it has "text/css", it'll be parsed as CSS, etc.? In that case, "executable" seems like a poor name for it.

If it's supposed to simply be an alias for not including an import assertion at all, it seems pretty low value to add.

weswigham commented 2 years ago

will support loading any supported module, keyed on the MIME type?

That's was my hope, yeah. It's annoying to need to prefetch a resource and inspect it's mime just to get the type right for a generic resource loading mechanism.