tc39 / proposal-binary-ast

Binary AST proposal for ECMAScript
964 stars 23 forks source link

Add high-level goal to allow ASTs to be inlined into <script> tags #6

Open aickin opened 7 years ago

aickin commented 7 years ago

I really think this proposal is great, and I wanted to suggest the addition of one relatively small high-level goal that I didn't see in the proposal.

It can be very useful in some cases to inline critical JavaScript into an HTML page, especially if the browser doesn't support HTTP/2 server push. I would love to see this project have as a high-level goal the ability for ASTs to be embedded into an HTML <script> tag. This could either be as the contents of the tag, or, more likely, as a data URI for the src attribute.

xtuc commented 7 years ago

Yes, I have this use-case.

Wouldn't it break exisiting HTML parser? How would you make the difference between a valid script (opaque blob in the page) and some corrupted data.

We could add a new type attribute in the script tag.

aickin commented 7 years ago

@xtuc I think there are two possibilities. First, make a new script type and add the AST as the content in between the <script> start and end tags:

<script type='ast-js'>
   jasdnfa;skduta2345u8aslkdnf;awejtiowpuetioajd;mns;kl
</script>

Browsers that don't understand a new script type just ignore it, so the fallback behavior is reasonable.

In order to work, though, this kind of embedding would put some requirements on the binary AST representation. For instance, ideally it would be impossible for the text </script> to ever appear in a binary AST representation, as that tells the HTML parser that the script is over.

The other way to include a binary AST into a <script> tag would be to use a data URI, like this:

<script src="data:application/ast-js;jasdnfa;skduta2345u8aslkdnf;awejtiowpuetioajd;mns;kl"></script>

In this case we just add the text of the binary AST to a data URI. We'd have to encode any ampersands and single quotes in the binary AST using &amp; and &apos; respectively, and there are probably some other characters, like newline.

I'm sure there are probably other considerations, and I slightly prefer the data URI option, but I think either of these methods should work as long as they are considered and tested during design.

xtuc commented 7 years ago

Yes, that makes sense.

I think the encoded binary format would be a lot more bigger that the original binary format. What would be the advantage vs code representation?

I imagine users could take advantage of the data URI to execute malicious code.

aickin commented 7 years ago

I think the encoded binary format would be a lot more bigger that the original binary format.

I'm not sure about that. I think that there are only a handful of characters that would need to be encoded in an data URI, though I might be wrong. And if the binary format avoided those characters, you could avoid encoding altogether.

What would be the advantage vs code representation?

It's somewhat easier to encode a few well-defined characters than it is to ensure that the substring </script> never occurs in the binary format. This is especially true when, as in this case, your binary format has to be able to include arbitrary strings.

I imagine users could take advantage of the data URI to execute malicious code.

Interesting. How do you think that would happen?

xtuc commented 7 years ago

I think the encoded binary format would be a lot more bigger that the original binary format.

Yes, i'm not sure either. We'll need to try it out to know.

I imagine users could take advantage of the data URI to execute malicious code.

Since the format won't be human-readable by design, it would offer an obfuscation mechanism out of the box. Static analysis (for malware detection for example) won't be possible unless you know how to unpack the format. I'm wondering if this has been considered for WebAssembly and if that's really an issue :thinking:.

Becavalier commented 7 years ago

Is that really necessary for binary ast since we have webassembly that also can be inlined into a module tag?

Yoric commented 7 years ago

@Becavalier I believe that this is pretty orthogonal to wasm.

On the other hand, I suspect that there would be several issues:

aickin commented 7 years ago

@Becavalier Thanks for the response! I'm pretty sure that the use cases for binary AST and WebAssembly are going to be pretty different, at least in the current landscape. It's unlikely that JavaScript is going to get compiled to WebAssembly, whereas this AST proposal is only for JavaScript.

Also, I'm fairly certain that <script type="module"> hasn't been started on yet in WebAssembly, and I think they aren't expecting to support inlining.

@Yoric Good list! I agree that an inlined encoding could add complexity and/or perf concerns, which is why I think it's important to think about it now, rather than bolt it on later. I think it's entirely likely that the right answer might end up being that the only way to effectively inline binary ASTs is to just use the base64 extension built in to data URIs and depend on gzip/brotli to keep size under control. However, there might be a better solution if embeddability is a high-level goal from the beginning.

littledan commented 7 years ago

Could we allow base64 data URLs here to avoid breaking HTML parsing?

Becavalier commented 7 years ago

@littledan Base64 is not a good choice, it will increase about quarter of the size of original binary file. For binary ast, an inline format seems not useful enough, but some embedding tag like <script type="ast" src="http://..."> will be good to use. Fetching ast file from remote server async will be much more efficient.