jquery / esprima

ECMAScript parsing infrastructure for multipurpose analysis
http://esprima.org
BSD 2-Clause "Simplified" License
7.06k stars 787 forks source link

Extend ExpressionStatement to indicate a directive prologue #1006

Closed ariya closed 8 years ago

ariya commented 9 years ago

In the current syntax tree, a directive such as "use strict" will appear as ExpressionStatement. A tool that consumes the tree and needs to be aware of the strict mode will have to perform an extra step to figure out whether such an ExpressionStatement is representing a directive or not.

In this proposal, a new flag directive is added to ExpressionStatement. The value is set to true for a directive prologue.

The current state:

> esprima.parse('"use strict"').body[0]
{ type: 'ExpressionStatement',
  expression: 
   { type: 'Literal',
     value: 'use strict',
     raw: '"use strict"' } }

If this proposal is implemented, it becomes:

> esprima.parse('"use strict"').body[0]
{ type: 'ExpressionStatement',
  expression: 
   { type: 'Literal',
     value: 'use strict',
     raw: '"use strict"' },
  directive: true }

Additional resources:

ariya commented 9 years ago

Based on the previous discussion (#318), seems that a Boolean flag is not enough since there is still additional work to identify the type of the directive.

The example given by @gibson042:

strictMode = currentNode.directive === true &&
    // "A Use Strict Directive may not contain an EscapeSequence or LineContinuation"
    currentNode.raw.slice( 1, -1 ) === "use strict";
ariya commented 9 years ago

Here is a possible alternative.

Use strict directive:

> esprima.parse('"use strict"').body[0]
{ type: 'ExpressionStatement',
  expression: 
   { type: 'Literal',
     value: 'use strict',
     raw: '"use strict"' },
  directive: 'use strict' }

Not a directive (the literal is in the first statement):

> esprima..parse('x; "use strict"').body[1]
{ type: 'ExpressionStatement',
  expression: 
   { type: 'Literal',
     value: 'use strict',
     raw: '"use strict"' },
  directive: null }

For the latter, I am ambivalent whether the value should null or an empty string.

michaelficarra commented 9 years ago

Has to be null, otherwise we couldn't distinguish it from the empty directive.

ikarienator commented 9 years ago

:+1:

jeffmo commented 9 years ago

Consider me ambivalent. In all of our tools that need to be aware of strict, we do this with little problem while tracking closure-scope -- so it's not really a problem that we've had troubles with.

This does get into a more general topic where we have to decide how and when to draw the line between what the parser's responsibility is and what we expect users of the parser to have to do. There's always going to be some static analysis that won't be in the parser that somebody will need; And there's also going to be several things that the parser could do relatively easily -- but may or may not be beneficial to users.

My vote: Do it behind a flag (and do all clearly "non-standard" things behind a flag)

michaelficarra commented 9 years ago

@jeffmo: I don't understand what you're saying. When a program is parsed into a SpiderMonkey AST, information that is crucial to understanding the program's semantics is lost. The programs "use strict", "use\x20strict", and ("use strict") each have different semantics but parse into the same tree. That's not something that a consumer of esprima can handle.

michaelficarra commented 9 years ago

While I'm here, I wanted to ask: why did we choose to place the value on ExpressionStatement instead of Literal? Unfortunately, they both allow conflicting state (ExpressionStatement with directive flag and non-string-Literal expression, or Literal with directive flag and either non-string value or not a child of ExpressionStatement), but is one better than the other?

jeffmo commented 9 years ago

@michaelficarra : Indeed it's possible to analyze an AST with a "use strict" expression and understand that the enclosing function is strict-mode without the parser having done that work for you

michaelficarra commented 9 years ago

@jeffmo: I've devised a game. I will parse the above three programs, and put them in an array in a random order. If you can consistently give them back to me in the order listed above, you win. You cannot win this game. :ghost:

jeffmo commented 9 years ago

@michaelficarra : The parser outputs the value and the raw form of the literal -- which allows you to distinguish the escapes; But you're right that we can't distinguish the grouped expression (without the token list or the original source). A bit of an esoteric case that I guess we don't (but perhaps others do) need to be concerned about.

mikesherov commented 9 years ago

@michaelficarra would be good to paste the output of those three programs you mentioned, because the problem is not immediately obvious.

@jeffmo the problem is that the raw value also loses escaping information.... but that seems like a bug in raw, no? @ariya ?

gibson042 commented 9 years ago

the raw value also loses escaping information

Am I missing something?

> console.log( JSON.stringify( esprima.parse('"us\\u0065\\x20stric\\164"'), null, 2 ) )
{
  "type": "Program",
  "body": [
    {
      "type": "ExpressionStatement",
      "expression": {
        "type": "Literal",
        "value": "use strict",
        "raw": "\"us\\u0065\\x20stric\\164\""
      }
    }
  ]
}

This is the behavior I expect, with raw showing the escape sequences, and I see it on both "master" and "harmony". If anything, there's a bug in not exposing parenthesized PrimaryExpressions:

> console.log( JSON.stringify( esprima.parse('("use strict")'), null, 2 ) )
{
  "type": "Program",
  "body": [
    {
      "type": "ExpressionStatement",
      "expression": {
        "type": "Literal",
        "value": "use strict",
        "raw": "\"use strict\""
      }
    }
  ]
}

...but I'm sure that discussion has been had before.

michaelficarra commented 9 years ago

Note that we're discussing the standard AST, and the raw value is an esprima extension, not part of the standard. The three programs in my challenge above will be identical using a conforming Relect.parse without position information. This is why we need an equivalent to the raw value to indicate a directive, not just a simple Boolean.

mikesherov commented 9 years ago

Good point. We should move this discussion to estree then, no?

mikesherov commented 9 years ago

https://github.com/estree/estree/issues/6

ariya commented 9 years ago

@michaelficarra I thought about adding the property to Literal node, but seems (from the libraries in the test corpus) that a typical library contains more Literal than ExpressionStatement. Thus, by choosing the latter, we will be adding less overhead to the constructed syntax tree.

ariya commented 9 years ago

Now that https://github.com/estree/estree/issues/6 reached a consensus, should I edit the issue description to reflect that change? Do we care about preserving the original proposal, e.g. the Boolean directive and such?

mikesherov commented 9 years ago

Editing description sounds fine to me.

ariya commented 9 years ago

Since https://github.com/estree/estree/pull/99 does not seem to lead to a consensus (surprising, consider that https://github.com/estree/estree/issues/6 was not too controversial), I would say let's give up on this to focus on other more pressing matters.

michaelficarra commented 9 years ago

@ariya That's not really an option. It's part of the language, whether we like it or not.

ariya commented 9 years ago

Well, currently every tool can work without the existence of this directive property.

michaelficarra commented 9 years ago

For some definitions of "work", sure. It doesn't concern you that none of those tools actually work properly?

mikesherov commented 9 years ago

Seems we need to work harder to close that estree issue. I'll revive that thread this week.