tree-sitter / tree-sitter-rust

Rust grammar for tree-sitter
MIT License
350 stars 98 forks source link

Function definitions do not parse if their body contains an unclosed block #138

Open Diomendius opened 2 years ago

Diomendius commented 2 years ago

The following Rust code

fn foo() {
    if true {}
}

produces the following parse tree (as formatted by nvim-treesitter/playground):

function_item [0, 0] - [2, 1]
  name: identifier [0, 3] - [0, 6]
  parameters: parameters [0, 6] - [0, 8]
  body: block [0, 9] - [2, 1]
    expression_statement [1, 1] - [1, 11]
      if_expression [1, 1] - [1, 11]
        condition: boolean_literal [1, 4] - [1, 8]
        consequence: block [1, 9] - [1, 11]

If the closing brace of the if block is removed, the parser no longer recognizes the enclosing function_item at all:

fn foo() {
    if true {
}
identifier [0, 3] - [0, 6]
parameters [0, 6] - [0, 8]
expression_statement [1, 1] - [2, 1]
  if_expression [1, 1] - [2, 1]
    condition: boolean_literal [1, 4] - [1, 8]
    consequence: block [1, 9] - [2, 1]

Though parsing incorrect syntax usefully is an exercise in futility in the general case, it should still be feasible to recognize the function definition, even if it's impossible to say objectively whether the if statement is missing a closing brace, or the function is.

In practical terms this impacts detecting indent level based on the parse tree, as the function body no longer exists to provide the outer indent level. I'm sure there are other consequences.

It is also strange that the parse tree does not include any ERRORs, even though the parse tree itself could not possibly represent valid Rust code; how can an identifier or parameters exist at the root of the parse tree?

For comparison, the C parser produces this parse tree for a similar function definition and if statement:

void foo() {
    if(true) {}
}
function_definition [0, 0] - [2, 1]
  type: primitive_type [0, 0] - [0, 4]
  declarator: function_declarator [0, 5] - [0, 10]
    declarator: identifier [0, 5] - [0, 8]
    parameters: parameter_list [0, 8] - [0, 10]
  body: compound_statement [0, 11] - [2, 1]
    if_statement [1, 4] - [1, 15]
      condition: parenthesized_expression [1, 6] - [1, 12]
        true [1, 7] - [1, 11]
      consequence: compound_statement [1, 13] - [1, 15]

Removing the closing brace of the if statement's block causes the parser to reinterpret the function block's closing brace as the closing brace of the if statement's block, but otherwise leaves the parse tree unchanged:

void foo() {
    if(true) {
}
function_definition [0, 0] - [2, 1]
  type: primitive_type [0, 0] - [0, 4]
  declarator: function_declarator [0, 5] - [0, 10]
    declarator: identifier [0, 5] - [0, 8]
    parameters: parameter_list [0, 8] - [0, 10]
  body: compound_statement [0, 11] - [2, 1]
    if_statement [1, 4] - [2, 1]
      condition: parenthesized_expression [1, 6] - [1, 12]
        true [1, 7] - [1, 11]
      consequence: compound_statement [1, 13] - [2, 1]

There are no ERRORs, which I suppose makes some sense (what region represents the closing brace that doesn't exist?), but the function is still parsed usefully.