tc39 / proposal-binary-ast

Binary AST proposal for ECMAScript
965 stars 23 forks source link

The field order doesn't always match the streaming compilation order #35

Closed arai-a closed 6 years ago

arai-a commented 6 years ago

When experimenting streaming compilation from multipart .binjs file to SpiderMonkey bytecode, the order of the fields often become trouble [1][2][3].

The issue is that, if we don't seek/lookahead, and don't keep on-memory AST converted from .binjs file, we should compile in the same order as the .binjs file. but SpiderMonkey often emits the bytecode in the different order (or interleaving sub-trees) than the original JS syntax itself, and sometimes applies optimization depends on the nodes which appears later.

the issue comes from that, we cannot lookahead without extra overhead with current format, (thus current experimental implementation doesn't support seek), because non-Skippable nodes doesn't have length property at the beginning of the serialized data.

of course we could emit the different bytecode than original JS, in the same order as .binjs file, but it would be the source of extra .binjs-specific bugs, and I'd like to avoid it as much as possible.

So, it would be nice if we can support (maybe partial-) tree-traversal without reading same field twice, in the file-format level.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1456006#c1 [2] https://bugzilla.mozilla.org/show_bug.cgi?id=1456404#c2 [3] https://github.com/binast/ecmascript-binary-ast/issues/33

Yoric commented 6 years ago

@arai-a I admit that I assumed we should be able to emit bytecode for a subtree, then compose it later. From what you write, I infer that this is not the case. Am I correct?

arai-a commented 6 years ago

currently BytecodeEmitter doesn't support composition of multiple bytecode-let, and it assumes the parse+compile of single function is done sequentially, in the fixed order.

(bytecode itself could be merged later with some trick, but function and scope tree might be troublesome. which could be solved given that it's already known in binjs, tho...)

Yoric commented 6 years ago

Mmmh...

Implementing a generic lookahead in the encoding would be pretty space-expensive. Byte lengths/offsets are very likely hard to predict, hence to compress.

If we have a list of nodes that cause problems, we could probably manually mark them as skippable (or equivalent).

Alternatively, we could probably rewrite these specific nodes to make them easier to stream compile.

@arai-a could you provide a few source examples, so that we can brainstorm on actual source?

Yoric commented 6 years ago

Note for posterity: we don't want to solve the problem just for SpiderMonkey, and other VMs may have a different compilation order. On the other hand, if we could make compilation in SpiderMonkey compositional (i.e. order-independent), that would be sufficient as a reference implementation.

arai-a commented 6 years ago

currently known examples are in https://bugzilla.mozilla.org/show_bug.cgi?id=1456006#c3 (switch-case) and https://bugzilla.mozilla.org/show_bug.cgi?id=1456404#c2 (C-style for)

but surely other loop structure and destructuring etc are also affected.

arai-a commented 6 years ago

So, if multiple subtrees can be compiled completely separately (with no interaction), we can compile them as bytecode-lets, and then compose them later.

If compilation needs to look into multiple subtrees first and then the code is generated based on the structure, or the code generation is done interleavingly between those subtrees, composition doesn't work well. (or even it's possible it requires overhead)

switch's case is something like that, because the optimization for jump operation is done based on the value of all case expressions. (if all case expressions are integer in small dense range, we use index-based jump)

we could compile subtrees for case expressions and case bodies separately, and when it turns out that all case expressions matches the requirement, discard the bytecode-lets for case expressions and compose index-based jump. (or maybe defer compilation of case expressions while they matches to requirement?)

If we go that direction, I'll try to implement composition and then experiment with all complicated syntaxes to see if it works.

arai-a commented 6 years ago

Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1464311 to think about SpiderMonkey-specific things about composition.

Yoric commented 6 years ago

Unless there is a blocker somewhere in SpiderMonkey, I can't think of any way that is better than composition. For the specific case of switch (), could the bytecode generator itself handle the complexity? Otherwise, it should be possible to add another annotation to the syntax, weaker than [Skippable] but with a similar power in terms of lookahead. It would complicate specifications a bit, and increase file size, but I hope that this would remain reasonable.

arai-a commented 6 years ago

given that composition is working in experimental implementation, and also we should think about optimization which requires kind-of-lookahead and composition, I'll close this for now.