tree-sitter / tree-sitter-javascript

Javascript grammar for tree-sitter
MIT License
318 stars 107 forks source link

Don't require a semicolon or line-break after the last statement in a block/program #9

Closed maxbrunsfeld closed 8 years ago

maxbrunsfeld commented 8 years ago

Refs #8

This solves the optional semicolon problem in a different way than @robrix used in the ruby grammar. Rob treated the ; as a separator between statements, rather than treating it as part of the statement.

What I did instead was to make it explicit that the last statement in a block is not required to end with a ; or a \n. So the expression statement with no ; in @joshvera's unit test parses as a trailing_expression_statement.

Rob's solution is more elegant, and in the case of ruby, better in every way. But optional semicolons are a little hairier in javascript than in ruby, because javascript allows code like this:

foo      // <--- not the end of a statement
  .bar
  .baz;

Basically, in order to tell whether or not the first line-break functions as a terminator, you have to look forward to the next token. Ruby and Go (and I think Rust) disallow this, but in JS, it's super common 😬.

Unfortunately, tree-sitter isn't yet smart enough to handle code like the example above if we use Rob's technique. The notion of having tokens like \n that can either be syntactically meaningful or function as whitespace is something that I kind of made up for tree-sitter, rather than basing it on some established method. I tried to make it Just Work™, but it's not fully baked 😢.

In the meantime, do you think this solution will suffice?

maxbrunsfeld commented 8 years ago

This might be another argument in favor of @robrix's suggestion (https://github.com/tree-sitter/tree-sitter-ruby/issues/6) of allowing nodes to be somehow 'renamed' via the grammar, for semantic reasons. In this case, we don't really want separate names for trailing_return_statement than the normal return_statement.

Maybe something like:

grammar({
  name: 'javascript',

  // This is one possible way API for expressing that what is parsed as a `trailing_return_statement`
  // should be represented as a `return_statement` in the AST.
  variants: {
    trailing_return_statement: 'return_statement'
  },

  rules: {
    // ...
  }
});
robrix commented 8 years ago

In the meantime, do you think this solution will suffice?

Sounds reasonable to me :+1:

joshvera commented 8 years ago

Thanks for this @maxbrunsfeld! I see you modified grammar.json directly. I was under the impression that grammar.js was compiled to grammar.json. What’s the relation between the two?

maxbrunsfeld commented 8 years ago

Yeah, grammar.json is generated based on grammar.js. I just check it in for convenience. It's not really needed at all actually, except that the JSON format is the the one that the core C++ library uses, and I use these grammars (via the JSON) in tree-sitter's own test suite.

maxbrunsfeld commented 8 years ago

Actually, I've got a better API:

psuedonyms: {
  trailing_expression_statement: 'expression_statement',
  trailing_return_statement: 'return_statement'
}