svaarala / duktape

Duktape - embeddable Javascript engine with a focus on portability and compact footprint
MIT License
5.87k stars 514 forks source link

ES2015 Arrow Functions #275

Open kphillisjr opened 8 years ago

kphillisjr commented 8 years ago

Arrow Function definitions. These examples are pulled from the ES6 Features website ( http://es6-features.org/ )

fatcerberus commented 8 years ago

I'm indifferent to the class support, but fat arrows would be one thing I'd love to have in Duktape. :) I don't imagine this to be too difficult to implement either, most of the required changes would be to the parser I'd think.

svaarala commented 8 years ago

They'd actually be a bit tricky to implement because current compiler has no intermediate representation. What that means concretely is that when it encounters terms, it will need to know whether it's dealing with an expression or an argument list or something else. For ES5.1 this is actually possible (LHS handling works because the compiler "ivalue" type is one indirection away from the actual result so it can represent either an LHS value or an RHS value), but just barely.

The concrete problem would be for the parser to know what to emit/do when it encounters the first "v" e.g. in these cases (it doesn't have lookahead; and especially not arbitrarily long lookahead):

v + 1
v = 3
v => v + 1
(v, w)  // comma expression
(v, w) => v + w

Adding a compiler IR will make these easy to handle, and will be needed to support ES6, but it's going to be a challenge to keep the compiler still small and its RAM footprint low.

seanmiddleditch commented 8 years ago

I don't think you need an AST or IR, nor even look ahead, to parse this language feature.

I believe this can be solved with operator precedence/collapsing rules. Don't emit code for identifiers or parenthesis until you hit an operator. If it's anything other than =>, collapse them as expressions and comma operators; otherwise, collapse them as argument lists.

As you encounter identifiers, push them into a fifo. If you encounter any operator or token other than => (or any token other than , or ) while in parenthesis) then you know you're not in an argument list and that you can immediately interpret the list of identifiers as expressions. If you have just the one identifier, emit the code for an id lookup. If you are in parenthesis and have more than one id in the fifo, emit code for id lookups and the usual comma operator between each.

If you hit the => you have an arrow function. If you hit the ) in an empty parenthesis expression you know you are an arrow function. If you hit anything other than the list of identifiers in parenthesis (like the plus in (v,w,x+) then you know you're not an arrow function.

So you get:

v+ // push v, see +, pop v as expression, continue parsing (expect expression)
v=> // push v, see =>, enter arrow state, pop v as argument, continue parsing (expect statement)
(v)+ // enter paren state, push v, exit paren state, see +, pop v as expression, continue parsing (expect expression)
(v,w)+ // enter paren state, push v, push w, exit paren state, see +, pop v as expression, pop w as expression, apply comma operator, continue parsing (expect expression)
(v,w+ // enter paren state, push v, push w, see +, pop v as expression, continue parsing (expect expression, still in paren state)
(v,w)=> // enter paren state, push v, push w, exit paren state, see =>, enter arrow state, pop v as argument, pop w as argument, continue parsing (expect statement)
() // enter paren state, exit paren state, note that contents were empty, continue parsing (expect =>), enter arrow state, push empty id fifo as empty argument list, continue parsing (expect statement)

Note that the + in those examples can be replaced with any token other than =>, including newlines and EOF (implicit semicolons).

All those cases are handled properly with just a handful of new grammar rules and the id fifo.

I haven't dug through the duk compiler, but I wouldn't be surprised if you already have something very similar to the id fifo for your operator precedence handling.

svaarala commented 8 years ago

I don't think that approach works with the current expression parser: it emits code as it goes along, and there's no "stack" or anything like that for operators. That stack is essentially an intermediate representation.

fatcerberus commented 8 years ago

There's no operator stack? How does that work? I would have thought there would have to be since for infix operators (as opposed to prefix or postfix, which are easy) you usually need a shunting yard algorithm to work out precedence.

svaarala commented 8 years ago

The compiler document explains the approach at a high level (it's not a very clean document and a bit out-of-date though). Here's the bit on expression parsing:

https://github.com/svaarala/duktape/blob/master/doc/compiler.rst#expression-parsing-algorithm

svaarala commented 8 years ago

Anyway, there are several alternative algorithms for parsing expressions to support e.g. destructuring. They just require considerable changes to the current algorithm, so it makes more sense to work in the necessary infrastructure to support all of ES6 if the expression parsing/compilation is rewritten. Doing a rework which doesn't lead into the ES6 direction (internally) would be somewhat wasted effort.

reqshark commented 8 years ago

i like top down

fatcerberus commented 7 years ago

It occurs to me that a lot of ES6 constructs like this could be implemented even in the current compiler, if the parser were reworked to be able to look ahead in the token stream (without consuming tokens). For productions with common prefixes, one could then perform pattern matching to determine which production we're dealing with. For arrow functions in particular:

if currToken is identifier and nextToken is arrow:
    parse arrow function

if [nextTokens] are [lparen, IdentifierList, rparen, arrow]
    parse arrow function
else
    parse expression
svaarala commented 7 years ago

Arrow functions require an arbitrary amount of lookup however because the expression may be a comma expression (or arbitrary length) or an arrow function.

svaarala commented 7 years ago

So in essence what that means is not a traditional lookahead but rather rewinding the token stream -- in effect you would still be "consuming" tokens and rewinding to reparse them. That's already done for the multiple passes each function gets, but it's not currently done inside a function. That's doable but I'm not sure it would be worth the effort compared to coming up with a better compiler state model.

zigazeljko commented 2 years ago

@svaarala I'm pretty sure the approach by @seanmiddleditch can be adapted to the current expression parser.

The parser emits code as usual while also keeping track of:

When hitting the => token, the parser then checks that the previous expression is a valid argument list. At this point, we know that everything between saved_end and the current end of code buffer corresponds to that argument list. In the current implementation, that would be a sequence of DUK_OP_GETVAR and DUK_OP_STREG.

The compiler then reconstructs the argument list (from that sequence of DUK_OP_GETVAR), moves the end of the code buffer back to saved_end, and recursively parses the rest of the arrow function.

A similar approach would also work for ES6 destructuring.

svaarala commented 2 years ago

In practice it would more complicated than that:

So while all of this might be solvable in some way, it's probably more complicated than adding actual parser support to deal with the expressions.