Open dead-claudia opened 6 years ago
Thanks for the proposals! Let me try and sort it a bit.
- Adding a constant for
undefined
- Allowing synthetic, anonymous locals that aren't viewable by direct
eval
orwith
- Adding combined type-checking operators for
typeof x === "number"
, etc.- Adding combined coersion operators for
x | 0
/~~x
,!!x
,x + ""
, etc.- Requiring fallthrough to be explicit in
switch
statements.- Encoding the bytecode as a hybrid register/stack machine.
While each of these proposals makes sense individually, I really feel that they are mostly orthogonal to Binary AST. Recall that each addition to Binary AST will make its adoption more complicated by browsers and VMs. For this reason, we wish to keep it as minimal as possible, which means sticking exactly to the JS language.
Once Binary AST is a standard, we'll be happy to rediscuss each of these proposals, though :)
Separating the pragmas from the source/function body.
That definitely makes sense.
Separating strings from the code, instead storing debug string/name references as {int offset; int length} pairs and putting their values consecutively in a string table before any real code references.
Already the case :)
Reducing logical "and"/"or" to corresponding if/else variants:
test && other
↔(tmp = test) ? tmp : other
,test || other
↔(tmp = test) ? other : test
.
Encoders are, of course, free to make such transformations, especially if they are optimizations. I don't think we want to hardcode them into Binary AST, though.
Using breakable blocks with an expression-based AST instead of a statement-based one.
I don't understand that one, could you elaborate?
Declaring locals before script/module/function body.
Already the case :)
Declaring imports/exports before module data table.
I like the idea. We'll need to think more about it.
Adding built-in source map support.
Definitely an objective. Not on the top of our list yet, but we're working on it.
Adding a built-in description/metadata field.
Indeed, we're thinking of a built-in metadata field. It shouldn't be hard to add, as long as it doesn't have semantics. I suspect that most toolchains will strip it, though.
For switches, how about having a table at the beginning that has goto
s in it, then have an array of statements after the table?
@j-f1 Why not. What's the expected benefit?
It would allow the engine to skip over unused code and only parse the statements used in the current condition. Additionally, the engine would be able to determine if the default
case needs to be taken without having to search the entire switch body for cases first.
@Yoric Of course, some of those points and thoughts were more or less very forward-looking, and not exactly what I'd expect would have a chance in making it into an MVP. (Like, for example, the synthetic anonymous locals.)
Using breakable blocks with an expression-based AST instead of a statement-based one.
I don't understand that one, could you elaborate?
Think similar to WebAssembly and how its blocks work, just optionally a little heavier (if you need to introduce locals in them) and with unconditionally only one return value. That's pretty much what I had in mind.
WebAssembly itself is expression-based, it uses blocks with (br N value)
/(br_if N cond value)
and is a hybrid register/stack machine. Expression-based languages are typically denser and simpler, since they don't need as much bookkeeping on what is and isn't a valid statement (almost everything is), and because it's pretty common for control flow to only assign to something once (consider why the do
expression proposal exists).
To expand on my concrete example:
An engine wouldn't need to do much work to infer that the result of a particular try
/catch
is only used in a particular variable if you just assign the result directly into the value itself. Also, it's not like you couldn't emit the statement-like form and it still work, too. Engines can desugar that into bytecode similarly to how they already do with let x; try { x = foo() } catch (e) { ... }
now (in fact, I'd expect most implementations to do just that), and Babel could compile its try
/catch
blocks within do
expressions pretty similarly.* (For reasons that make no sense to me, [CoffeeScript](https://coffeescript.org/#try:x%20%3D%20try%0A%20%20foo()%0Acatch%20e%0A%20%20bar()) and Babel + do
expressions instead generate IIFEs.)
* In reality, an expression-based syntax means we wouldn't need to support do
expressions in the core proposal, since it'd be flat out useless at best. An outer breakable block would be what early return
s would compile into if they couldn't be compiled into an if
/else
.
@j-f1
It would allow the engine to skip over unused code and only parse the statements used in the current condition. Additionally, the engine would be able to determine if the default case needs to be taken without having to search the entire switch body for cases first.
We're thinking about something kind-of like this, but if we do it, this will complicate the specifications of switch()
wrt syntax errors.
@Yoric did a great job separating out the @isiahmeadows' language feature ideas from other suggestions. One of the highest, if not the highest constraints in this design space is to not bifurcate the language. To that end, I take the existence of an "obvious" bijection (modulo whitespace and whatnot) between source JS and binary AST to be a hard constraint. That is, binary AST will not add any language features. What's expressible in source JS must be expressible in binary AST.
Like @Yoric said, the biggest reason is adoption, both for developers and for web browsers. Please do not take binary AST to be our trying to design a better JS. It is very much just the JS language with a more performant transport format.
I also saw mentions to bytecode. There will never be any sort of bytecode involved with binary AST.
@syg I'm in high support of keeping it bijective. My hard constraint with all the "language feature" ideas I proposed were that every single one of them had to be trivially desugared. The only exception being synthetic locals used in a scope accessible to a direct eval
.
I also saw mentions to bytecode. There will never be any sort of bytecode involved with binary AST.
I did sometimes conflate the binary AST representation with the concept of "bytecode", but I know it's technically incorrect - bytecode is a sequence of instructions, not a graph of nodes. I did make one suggestion here that would've actually made it bytecode in the technical sense, but it was kind of a thing thrown at the wall:
- Encoding the bytecode as a hybrid register/stack machine.
I was already neutral on it to start, just throwing it out there for the sake of being thrown out there. I'm also not a big fan of the idea, since bytecode doesn't always end up smaller than a basic AST, and it's not always the most intuitive. And the more I think about this, the more I'm leaning against it. With this, all of a sudden you need a decompiler to debug the code, and we shouldn't be making browsers ship a glorified decompiler. This isn't WebAssembly, and even that one has chosen to make their bytecode a borderline AST format, just one that's mostly in reverse Polish notation for the node types. (Their text format lays this pretty bare.)
@isiahmeadows:
every single one of them had to be trivially desugared
As you correctly point out, engines already do these sorts of simple translations, often at parse time. What's the proposed benefit to having the compiler do them? Keeping in mind that there's a real cost, which is adding new node types and defining and implementing their semantics.
@Yoric I forgot to clarify that the proposed global "metadata" field was supposed to not have any real semantics attached. It's for tools, not engines.
@bakkot
What's the proposed benefit to having the compiler do them?
Slightly less data over the wire, and engines don't need to enlist their heuristics to figure them out. Not all of them are especially justifiable, but some of them could be.
x | 0
/~~x
, !!x
, strict undefined
→ void 0
, this isn't much (like one byte), and it might not justify their existence - it's like saving one byte.break
is just boilerplate, and the latter is trivially compressible if you change the switch
statement layout to an array of cases + array of block bodies, since it'd just be a bunch of two-byte sequences. The inversion itself wouldn't benefit engines directly, but the size gain could be helpful. (I don't have raw numbers to back this up, however.)typeof x === "function"
, usually minified to "function"==typeof x
, it'd save them from having to fetch the data table, making it easier and much faster to parse. It'd also encourage minifiers to convert the switch (typeof foo) { ... }
idiom to if
/else
branches instead, saving engines from having to desugar that also - V8 didn't optimize that until TurboFan hit.@isiahmeadows
[undefined] is already a pseudo-literal in strict mode.
In what way? Just tested:
(function(){'use strict'; const undefined = 3; return undefined; })();
returns 3.
undefined
is only a pseudo-literal in ES5+ in the global scope (in that it can't be reassigned there), but it can be shadowed anywhere else; i don't think strict mode comes into play at all.
@ljharb @erights I meant it's a pseudo-literal in that it has those special semantics - it's not a simple identifier in strict mode. It evaluates to:
undefined
if such a binding is defined for the current lexical scope, global lookups excluded.undefined
if no such binding is defined for the current lexical scope, global lookups excluded.You could say it's a context-sensitive, non-context-free construct that could be parsed as either a keyword or identifier, depending on what variables are declared in its surrounding lexical scope. Implementations almost necessarily implement this as a keyword at parse time, since strict mode requires special semantics, and punting the check to identifier lookup will inevitably create performance problems. Nope, just a combination of non-configurable, non-writable and in the global scope...
In sloppy mode, it's just a simple global binding lookup, so it's of course not a special syntactic construct.
Where is this special case in the spec?
Is this special case observable?
I edited it because I thought I knew how it worked, but apparently I was mistaken on particulars. 🙂 I thought it was a special case, but it's just a combination of descriptors + etc.
For what it's worth, UglifyJS only reduces undefined
to void 0
in strict mode.
I'm not convinced it's required to marry the binary format too tightly to the JS source, and there are things you could do to make it simpler. It's not like the binary format is meant to be human-readable, just machine-readable. There are also idioms in compiled-to-JS code (e.g. from Elm, CoffeeScript, and Babel) that could also stand to benefit from a binary format that that's at least aware of some of their needs.
Here's a few of the ideas I have to throw at the wall. Apologies if this comes across as a bit rambly.
Adding a constant for
undefined
void 0
(what UglifyJS replaces strict-modeundefined
with) and similar.undefined
variable overvoid 0
for code size reasons.Allowing synthetic, anonymous locals that aren't viewable by direct
eval
orwith
Adding built-in source map support.
Adding a built-in description/metadata field.
Adding combined type-checking operators for
typeof x === "number"
, etc.typeof
use cases, and engines already desugar these 100% of the time. (It's an obvious optimization.)Adding combined coersion operators for
x | 0
/~~x
,!!x
,x + ""
, etc.x | 0
or at least act as a synonym.Separating the pragmas from the source/function body.
Separating strings from the code, instead storing debug string/name references as
{int offset; int length}
pairs and putting their values consecutively in a string table before any real code references.Reducing logical "and"/"or" to corresponding if/else variants:
test && other
↔(tmp = test) ? tmp : other
,test || other
↔(tmp = test) ? other : test
.Using breakable blocks with an expression-based AST instead of a statement-based one.
let x = foo(); try { x = foo(); } catch (e) { if (!(e instanceof SafeError)) throw e }
, where it could be simplified to something roughly likelet x = try { foo() } catch (e) { e instanceof SafeError ? undefined : throw e }
. Edit: Missed a backtick 😄Requiring fallthrough to be explicit in
switch
statements.switch
would become much smaller when encoded.Declaring locals before script/module/function body.
Declaring imports/exports before module data table.
Encoding the bytecode as a hybrid register/stack machine.