tree-sitter / tree-sitter-javascript

Javascript grammar for tree-sitter
MIT License
318 stars 107 forks source link

Incorrectly parsed labeled statement #259

Open helixbass opened 10 months ago

helixbass commented 10 months ago

The following piece of code is valid but it is parsed incorrectly:

if (foo) { bar: 'baz' }

Here's a link to the TypeScript Playground showing that the snippet above is valid JavaScript or TypeScript:

https://www.typescriptlang.org/play?#code/JYMwBAFCD20JRgN5gEYEMBOAuMBydAXrmAL5A

The output of tree-sitter parse is the following:

(program [0, 0] - [1, 0]
  (if_statement [0, 0] - [0, 23]
    condition: (parenthesized_expression [0, 3] - [0, 8]
      (identifier [0, 4] - [0, 7]))
    consequence: (expression_statement [0, 9] - [0, 23]
      (object [0, 9] - [0, 23]
        (pair [0, 11] - [0, 21]
          key: (property_identifier [0, 11] - [0, 14])
          value: (string [0, 16] - [0, 21]
            (string_fragment [0, 17] - [0, 20])))))))

It looks like this should be parsing as a block with a labeled statement but it is parsing as an object literal

amaanq commented 9 months ago

this one would be really really hard to fix w/o a mechanism to disallow rules in certain contexts (objects in expression statements in if statements)

helixbass commented 9 months ago

Poking at this a little bit, it seems a bit more general than just this if case

I think the fundamental thing that's currently not being enforced by the grammar is (from the ECMAScript standard):

An ExpressionStatement cannot start with a U+007B (LEFT CURLY BRACKET) because that might make it ambiguous with a Block

So eg the Objects tests (from test/corpus/expressions.txt):

// currently parsing as object literal, should parse as block statement
{ a: "b" };

// currently parsing as object literal, should be parse error
{ c: "d", "e": f, 1: 2 };

// currently parsing as object literal, should parse as block statement
{
  g: h
}

are incorrect parsing currently

I don't think I probably have a good enough grasp of tree-sitter grammars to take a crack at this myself. But it seems like maybe you'd need eg $._expressions_without_leading_curly_brace instead of $._expressions in the definition of expression_statement?

amaanq commented 9 months ago

So most statements cannot have an object literal? Interesting

helixbass commented 9 months ago

Ya statements can't start with an object literal (or object-destructuring pattern)

I vaguely recall that from working in JS/Typescript that say you wanted to object-destructure into an existing variable you'd have to wrap in parentheses to avoid the "leading {" issue eg:

let x = 1;
({x} = {x: 2});