jquery / esprima

ECMAScript parsing infrastructure for multipurpose analysis
http://esprima.org
BSD 2-Clause "Simplified" License
7.06k stars 787 forks source link

Not ESTree compatible #1639

Closed ghost closed 7 years ago

ghost commented 7 years ago

@ariya I was reading this in the readme - "Sensible syntax tree format as standardized by ESTree project"

So I compared with EsTree, and found that Esprima isn't Estree compatible. First of all everything should inherit from a Node class. That doesn't happend in Esprima. Here is a few of my findings

https://github.com/estree/estree/blob/master/es5.md#identifier

In Esprima there is an extra raw field. And this ForStatement are missing the update field. https://github.com/estree/estree/blob/master/es5.md#forstatement

And this node totaly break with EStree: FunctionDeclaration https://github.com/estree/estree/blob/master/es5.md#functiondeclaration

The FunctionDeclaration has a lot of "fields" that isn't in the specs at all. The same is the case for FunctionExpression.

And following "nodes" are missing

There are a lot more violations of the specs. I just scratched the surface.

mikesherov commented 7 years ago

ESTree as a specification only determines the output structure, and only invokes types in the conceptual sense, not the literal sense.

ghost commented 7 years ago

@mikesherov I agree in that, but when the deviation is greater than 53% from the specs, and also taken into account that Acorn now more or less follows the specs 80%. And end-users now choose Acorn over Esprima because of it's modularity and ESTree compat - even if Esprima has way better performance - this is a huge gap from the Estree draft / specs in my world.

The Abstract Syntax Tree is not compatible either with Acorn. E.g. the location of the node is different.

And Espree - started as a fork of Esprima - now prefer AND used Acorn as it basis because of this things.

React also used a Esprima fork until december 2015, but out of same reasons they are now using a Acorn/Babel combo.

Valid reasons enough to try to be Estree compatible.

I - myself - choose Esprima because of the high quality of the code and professional developers with years of experience. At least @ariya seems to have it. And ofc. Esprimas performance.

mikesherov commented 7 years ago

@nowindowsowrking, thanks for your contributions so far.

The ESTree spec was born out of a collaboration between Mozilla, Esprima, and Acorn, and an evolution and standardization of the SpiderMonkey AST. Esprima and Acorn are both 100% compatible with ESTree, because types have never been part of the spec, but rather an implementation detail. Up until the switch to TypeScript, Esprima itself didn't have any specific constructors for the different nodes anyway.

In regards to location information, this is also not part of the ESTree spec.

Lastly, there are many reasons projects choose either Acorn or Esprima or Babylon. They all have their own tradeoffs, and Esprima has chosen to focus on correctness and speed as primary concerns over modularity or implementing < stage 4 features.

As a member of the ESLint team myself, there are a few reasons Espree ultimately switched to Acorn, but none of them were because of adherence to types. ESLint itself only relies on the structure of the AST, and does no type checking.

Babel built babylon off of Acorn primarily because of its modularity. Again, nothing to do with Type checking nor adherence to ESTree, especially considering Babylon is non-ESTree compatible.

So it's ultimately up to @ariya whether he considers the type mismatches a violation of the spec and worth addressing, but being a founding member of the ESTree spec myself, and intimately involved in the situations you referenced, I can safely say that types are not a part of those decisions.

Thanks for following up and pushing us on this!

ghost commented 7 years ago

@mikesherov I'm impressed about your knowledge :) I noticed however now after reading all versions of this code since 1.0 some of the Esprima code is very old and has not been changed since the beginning. Meanwhile the Estree specifications have changed. But I agree it's up to @ariya to make a decision regarding this matter.

ariya commented 7 years ago

Thanks @nowindowsowrking and @mikesherov!

Before I post my detailed comment, here is a slightly relevant side note.

To understand the landscape of JavaScript parsers, it is illustrative to digest other related materials:

Also, for a meta topic like this, I can recommend an alternative forum (and sometimes better): posting it to our mailing-list instead: https://groups.google.com/forum/#!forum/esprima.

ariya commented 7 years ago

ESTree specification does not include a list of compatibility requirements. Therefore, when there is a claim of "X is compatible with ESTree", it means (unfortunately) we have to analyze it from X's perspective.

In this particular context of Esprima, the criteria being used to specify ESTree compatibility is as follows:

If there is a tool Y that is constructed only by reading the ESTree specification, then such a tool must function when it is consuming the syntax tree produced by Esprima parser.

The above compatibility principle allows Esprima to offer an enriched syntax tree ("superset") that benefits those who want to take advantage of it, and at the same time it keeps the output useful for tools which can only understand the ESTree format only. This also means that such a format extension needs to be additive, i.e. the removal of any extension should cause the format falls back again to ESTree without any further modification.

A classic example of this additive extension is the raw property for a literal. In the ESTree specification, it says that:

interface Literal <: Expression {
    type: "Literal";
    value: string | boolean | null | number | RegExp;
}

However, if we examine Esprima output, then there is another property there. This was added a long time (early 2012, see commit 0fe81b279) due to a feature request. Apparently, being able to obtain the raw literal is supposed to be useful for code rewriter/formatter.

> esprima.parse('0x2a').body[0].expression
Literal { type: 'Literal', value: 42, raw: '0x2a' }

Meanwhile, if there exists a tool that can handle every literal only according to ESTree specification, then that tool will be completely oblivious to the existence of the raw property. Thus, this raw property is an additive enrichment and not to be considered as a specification violation.

In ESTree, every interface is described following the concept of structural type system, i.e. a node is considered of a certain type if the node matches the interface structure and not by its name. As an example, look at the following interfaces:

interface Node {
    type: string;
    loc: SourceLocation | null;
}

interface Program <: Node {
    type: "Program";
    body: [ Statement ];
}

With that in mind, the following JavaScript code produces a perfectly valid Program node:

function createEmptyProgram() {
  return {
    type: 'Program',
    body: [],
    loc: null
  };
}

Note how the function createEmptyProgram() constructs a plain JavaScript object, without any inheritance whatsoever (from the Node object). This does not mean that the object created by createEmptyProgram() is not ESTree compatible.

In fact, in the beginning of ESTree specification, it is stated that "ESTree AST nodes are represented as Node objects, which may have any prototype inheritance…" (note the use of may, not must).

Let's address each compatibility concern.

For a loop statement using for, ESTree specifies the following interface:

interface ForStatement <: Statement {
    type: "ForStatement";
    init: VariableDeclaration | Expression | null;
    test: Expression | null;
    update: Expression | null;
    body: Statement;
}

Esprima output matches that:

> esprima.parse('for (i = 0; i < 3; ++i);').body[0]
ForStatement {
  type: 'ForStatement',
  init: 
   AssignmentExpression {
     type: 'AssignmentExpression',
     operator: '=',
     left: Identifier { type: 'Identifier', name: 'i' },
     right: Literal { type: 'Literal', value: 0, raw: '0' } },
  test: 
   BinaryExpression {
     type: 'BinaryExpression',
     operator: '<',
     left: Identifier { type: 'Identifier', name: 'i' },
     right: Literal { type: 'Literal', value: 3, raw: '3' } },
  update: 
   UpdateExpression {
     type: 'UpdateExpression',
     operator: '++',
     argument: Identifier { type: 'Identifier', name: 'i' },
     prefix: true },
  body: EmptyStatement { type: 'EmptyStatement' } }

For a function declaration, let's look at the following ESTree interfaces for ES5:

interface Function <: Node {
    id: Identifier | null;
    params: [ Pattern ];
    body: BlockStatement;
}

interface Declaration <: Statement { }

interface FunctionDeclaration <: Function, Declaration {
    type: "FunctionDeclaration";
    id: Identifier;
}

and for ES2015:

extend interface Function {
    generator: boolean;
}

Meanwhile, the output of Esprima:

> esprima.parse('function f(){}').body[0]
FunctionDeclaration {
  type: 'FunctionDeclaration',
  id: Identifier { type: 'Identifier', name: 'f' },
  params: [],
  body: BlockStatement { type: 'BlockStatement', body: [] },
  generator: false,
  expression: false }

For a regular expression literal, ESTree specifies:

interface Literal <: Expression {
    type: "Literal";
    value: string | boolean | null | number | RegExp;
}
interface RegExpLiteral <: Literal {
  regex: {
    pattern: string;
    flags: string;
  };
}

And if we let Esprima process a regular expression:

> esprima.parse('/abc/i').body[0].expression
RegexLiteral {
  type: 'Literal',
  value: /abc/i,
  raw: '/abc/i',
  regex: { pattern: 'abc', flags: 'i' } }

There is hardly difference than what ESTree mandates. This is not a surprise, treating a regular expression as a special form of literal was in fact originated first in Esprima itself (see commit 2641aff502, Jun 2014), adopted by other parsers, and finally proposed and made it into ESTree (Feb 2015).


With this explanation, hopefully it is demonstrated that Esprima is indeed compatible with ESTree.