whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.03k stars 2.62k forks source link

[Feature Request] Generalize imports to permit statically loading arbitrary text and data. #7706

Open 7ombie opened 2 years ago

7ombie commented 2 years ago

Imagine a library author... She has a module that simply exports a class. Her users import the class and instantiate instances of it. The class is partly implemented in JavaScript, and partly implemented in GLSL (as a WebGL shader). This seems simple, but there is no way she can implement this (otherwise synchronous) library without making its API asynchronous, purely due to her class being unable to statically import the rest of its source code from the GLSL file.

She can copy and paste the GLSL source code into a multiline JavaScript string, and export that from a JS module, or she can serialize the string to a JSON file and import it from there. Both options suck, as she needs to be able to maintain the GLSL source code.

Adding some kind of import assertion for GLSL would fix the specific issue, but a GLSL import would just be importing a string of source code (GLSL doesn't require any parsing or execution during importation), and there are many other usecases for importing from various text-based file formats. And binary file formats as well.

Currently, import assertions have a narrower scope, focussed on specific types that will likely evolve to include JS, CSS, JSON, HTML and Wasm. I would like suggest extending this to permit generic imports, which can import arbitrary text or data.

Generic imports would not parse or execute anything. They basically just load a string or an array of bytes. However, having a text type that returns a String and a data type that returns an ArrayBuffer may be simplistic, as specifying a text encoding or importing other data types (Blob, Uint8Array et cetera) may be required/desired.

To be honest, I'm not entirely sure how this should work, and am somewhat ignorant regarding import assertions and MIME types in general, but hope I've at least provided a useful starting point for addressing this.

annevk commented 2 years ago

cc @whatwg/modules

bmeck commented 2 years ago

This sounds more like https://github.com/tc39/proposal-import-reflection 's domain, assertions don't change the result of importing they just perform a guard.

7ombie commented 2 years ago

@bmeck - I proposed this in the wrong place to begin with, and this may still be in the wrong place. Sorry. I don't understand the standards process as well as I should (though I am going to spend some time learning more about it this week).

@GeoffreyBooth also mentioned (in the issue I linked to) that assertions are not really appropriate for what I'm suggesting, for the same reason you gave. I was under the incorrect impression that assertions are the only mechanism for importing anything other than JavaScript, so wanted to try and shoehorn generic imports into assertions. The Wasm module integration proposal does seem better suited to this kind of generalization.

GeoffreyBooth commented 2 years ago

I think a better way of putting it is that @7ombie is requesting a way to use import syntax for a primitive type. To take the most basic, imagine a text file that contains hello world, that you could import like so:

import contents from './hello-world.txt' assert { type: 'text' };

contents === 'hello world'; // true

Just like with JSON or CSS, the MIME type of the served hello-world.txt would need to match the assertion; so in this case, a header like Content-Type: text/plain would need to be present and validated by type: 'text'. And like JSON, there would be only a default export; in this case, the contents of the resource as a string. That’s it.

Likewise, we could have a similar module type for general binary data. assert { type: 'data' } could correspond with Content-Type: application/octet-stream, say, and the resulting default export could be an ArrayBuffer or Blob. (You tell me which makes more sense, or if we’d need to support both somehow.)

These generic module types could correspond with the methods of the Response object, used by fetch, which has .json(), .text(), .blob() and .arrayBuffer().

7ombie commented 2 years ago

Thank you, @GeoffreyBooth. Much appreciated.

annevk commented 2 years ago

Another alternative might be ReadableStream. And then you could use TextDecoder to get text.

It does seem like @guybedford's & @lucacasonato's proposal is relevant as there are multiple ways you might want to represent the module and that syntax could be a way to allow picking between them. As in, you might want to assert something is a text/plain module, but you'd like to get to it as a buffer, a string, a stream, etc.

domenic commented 2 years ago

One use case to keep in mind is files that don't have text/plain MIME types on the server. E.g. CSV files or ICS files (text/csv and text/calendar). Maybe this can work fine though if type: "text" import assertions just checks that the MIME type's type is text.

This seems potentially harder for binary types. Maybe they can be imported without assertions and just the presence of the reflector implies you're buying into them as arbitrary data streams?

annevk commented 2 years ago

That's a fair point. I do think as is/could be equivalent to assertions security-wise, if it can be a strong guarantee about the resulting interpretation. I guess one thing to sort out is whether you ought to be able to import text/javascript as text, but either answer is probably okay security-wise.

bmeck commented 2 years ago

If it checks text/ it needs to block the JS MIMEs using it. If it alters the result, that is different than just a check.

GeoffreyBooth commented 2 years ago

As a user, I would expect that if I could import JSON without as, I should be able to import text the same way. At least for text, I would think that checking the first part of the MIME is enough as the end result (a string) is the same for any of the text/* MIME types.

One option is to have a default that works without as, and then as can adjust it. Like import data from './foo.zip' assert { type: 'data' } could import as blob by default, and if you add as ArrayBuffer it’s provided as an array buffer. (Or vice versa, I don’t know what’s a better default between blob and array buffer.)

Another thing to consider is other text-like formats like SVG (image/svg+xml) and HTML (text/html).

annevk commented 2 years ago

Without as it seems kinda bad to hijack a whole swath of MIME types, especially as there are existing MIME types that would not be treated as text, such as text/javascript. Treating text/plain as text seems reasonable, but then you get back to Domenic's point above as to whether that's useful enough.

GeoffreyBooth commented 2 years ago

This is making me wonder if perhaps the import assertions type itself should be a full MIME type, so we don’t have these questions of “what MIME type should correspond with this module type.” Like import styles from './styles.css' assert { type: 'text/css' }. It almost makes more sense than the current system, since if all the import assertion is doing is validating a MIME type, it would follow that the user should provide the MIME type that’s being validated.

The counterargument is if we ever want to allow an explicit type for JavaScript, because servers send so many MIME types for JavaScript (text/javascript, application/javascript, application/x-javascript, etc.) and do we really want the assertion to fail if it says assert { type: 'text/javascript' } when the server sent application/javascript.

bmeck commented 2 years ago

The current case can cover whole MIME groups which is the advantage of it. ref https://mimesniff.spec.whatwg.org/#mime-type-groups . So it is still checking specific MIMEs, just a group of them

annevk commented 2 years ago

Indeed, also having to type out a whole MIME type is not great.

pshaughn commented 2 years ago

I might be misunderstanding something here, but it sounds like this is basically asking for the ability to do a ~synchronous~ load-blocking fetch (thank you for the correction @domenic), just with some constraints like having to do it at load time from a script's top level instead of in a function, and not being able to set the headers about it. Since importing Javascript is already handling the hard parts of that ~synchronousness~ load-blocking, it seems like it could be a nice convenience tool to have. It further seems to me that, for importing blobs and text, the MIME assertion isn't needed for any safety reasons, since any file you could reference in an import statement could already be fetched as a blob or text anyway.

domenic commented 2 years ago

No, import statements are asynchronous.

annevk commented 2 years ago

@pshaughn the reason you need as or assert is that you don't want to give the server the option to change the MIME type to JavaScript. (That's why we require assert for JSON module scripts.)

ljharb commented 2 years ago

What about the cases where I do want to give the server that option?

pshaughn commented 2 years ago

I think I might understand more now. Since assert exists in the language and as doesn't yet, the only way to get this behavior at all is via assert, which needs to wrangle MIME types. as would be a more powerful option that avoids the MIME type concern, but only if it ever actually materializes as a language feature. (I would probably wait until the as version before I use this feature myself, since I tend not to be in control of the MIME types on the servers my code ends up on.)

GeoffreyBooth commented 2 years ago

I might be misunderstanding something here, but it sounds like this is basically asking for the ability to do a load-blocking fetch, just with some constraints like having to do it at load time from a script’s top level instead of in a function, and not being able to set the headers about it.

There are a few distinctions. Now with top-level await, one could do:

const text = await (await fetch('./foo.txt')).text()

Which isn’t all that different from import text from './foo.txt' assert { type: 'text' } at the same place; but the fetch version wouldn’t get cached (I think?) if there are calls like this in multiple modules, and each separate resulting text variable would be independent rather than sharing state, if that means anything for text. Also the network request for this would happen later, after the browser has started evaluating the JavaScript, not as soon as the import statement is parsed. (Please correct me if I’m wrong on this or any points.) And like with JSON and CSS, security would dictate that we need the assertion, as we would for any module type that’s less powerful than JavaScript (in other words, not meant to be executable).

So basically the only advantages fetch offers are being able to make a request that isn’t a simple GET, and being in full control of how to parse the response rather than the browser parsing it automatically according to the Content-Type header (assuming it aligns with the assertion).

7ombie commented 2 years ago

My issue with using fetch is turning otherwise synchronous APIs into asynchronous ones. This can be an issue, especially when authoring libraries, where you want to present the most elegant API to the user.

Constructors cannot be asynchronous, so you end up exporting a class with a partial constructor, and some async method that returns a promise that resolves once the instance is fully initialized.

domenic commented 2 years ago

My issue with using fetch is turning otherwise synchronous APIs into asynchronous ones.

This is not really an issue anymore, with top-level await.

7ombie commented 2 years ago

@domenic - Sorry. You're correct. I misunderstood how top-level await worked. That's really nice to know. Thanks.