tree-sitter / tree-sitter-javascript

Javascript grammar for tree-sitter
MIT License
314 stars 108 forks source link

Bug: 'class_static_block' cant contain 'extras' like newlines #313

Closed jackschu closed 4 weeks ago

jackschu commented 4 weeks ago

Repro

The following piece of code is valid but it is parsed incorrectly:

class Foo {
    static
    //hi

  {
     1 + 1;
  }
};

Here's a link to the TypeScript Playground showing that the snippet above is valid JavaScript or TypeScript: https://www.typescriptlang.org/play?#code/MYGwhgzhAEBiD29oG8BQ0PQgFzNglsOpgPQkAW+qxamGAjNANTT0DcxAvqpx0A

The output of tree-sitter parse is the following:

(program [0, 0] - [8, 0]
  (class_declaration [0, 0] - [2, 8]
    name: (identifier [0, 6] - [0, 9])
    body: (class_body [0, 10] - [2, 8]
      member: (field_definition [1, 4] - [1, 10]
        property: (property_identifier [1, 4] - [1, 10]))
      (comment [2, 4] - [2, 8])))
  (statement_block [4, 2] - [6, 3]
    (expression_statement [5, 5] - [5, 11]
      (binary_expression [5, 5] - [5, 10]
        left: (number [5, 5] - [5, 6])
        right: (number [5, 9] - [5, 10]))))
  (ERROR [7, 0] - [7, 1])
  (empty_statement [7, 1] - [7, 2]))
../../open_source/prettier/tests/format/js/class-static-block/with-line-breaks.js   0 ms    (MISSING "}" [2, 8] - [2, 8])

this is on commit ac10a11e0c8db512f70e6b798260d2516d22454c

Analysis

I did a bunch of digging, it seems like this happens because the language believe 'static' is the start of a property name I think tree sitter here should have decided that the class_static_block and field_definition here represent a conflict and gone to GLR, but somehow the presence of $._semicolon confuses it

    class_body: $ => seq(
      '{',
      repeat(choice(
        seq(field('member', $.method_definition), optional(';')),
        seq(field('member', $.field_definition), $._semicolon),
        field('member', $.class_static_block),
        field('template', $.glimmer_template),
        ';',
      )),
      '}',
    ),

I tried adding to conflicts after removing optional(';') and $._semicolon, the former is clearly redundant, but the latter is load bearing because the spec requires that semicolon or else statements like this would be considered valid

class Bar { bar = 4 baz = 4 }
amaanq commented 4 weeks ago

I think the best way to fix this was just to allow an optional "automatic semicolon" after 'static', creating a conflict and letting GLR resolve it, thanks for the report

jackschu commented 3 weeks ago

hey @amaanq it seems like #314 will cause treesitter to parse the following block without emitting an error

class Foo {
    static ; {}
}

But a JS runtime would give Unexpected token '{' is this OK? I've vaguely heard that treesitter is allowed to parse more than what the spec allows, but I'm curious how you think about it

amaanq commented 3 weeks ago

you're right, sorry about that! it should be just _automatic_semicolon, should be fixed in a sec