tree-sitter / tree-sitter-haskell

Haskell grammar for tree-sitter.
MIT License
150 stars 35 forks source link

Incorrect parse due to top-level splices #111

Open wenkokke opened 5 months ago

wenkokke commented 5 months ago

The following piece of Haskell code is SOMETIMES incorrectly parsed as containing a series of top-level splices.

id :: a -> a
id x = x

const x y = x

fib :: Integer -> Integer
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

Specifically, the common parse is as follows:

(haskell
  (signature
    name: (variable)
    type: "::"
    type: (fun
      (type_name
        (type_variable)
      )
      "->"
      (type_name
        (type_variable)
      )
    )
  )
  (function
    name: (variable)
    patterns: (patterns
      (pat_name
        (variable)
      )
    )
    "="
    rhs: (exp_name
      (variable)
    )
  )
  (function
    name: (variable)
    patterns: (patterns
      (pat_name
        (variable)
      )
      (pat_name
        (variable)
      )
    )
    "="
    rhs: (exp_name
      (variable)
    )
  )
  (signature
    name: (variable)
    type: "::"
    type: (fun
      (type_name
        (type)
      )
      "->"
      (type_name
        (type)
      )
    )
  )
  (function
    name: (variable)
    patterns: (patterns
      (pat_literal
        (integer)
      )
    )
    "="
    rhs: (exp_literal
      (integer)
    )
  )
  (function
    name: (variable)
    patterns: (patterns
      (pat_literal
        (integer)
      )
    )
    "="
    rhs: (exp_literal
      (integer)
    )
  )
  (function
    name: (variable)
    patterns: (patterns
      (pat_name
        (variable)
      )
    )
    "="
    rhs: (exp_infix
      (exp_apply
        (exp_name
          (variable)
        )
        (exp_parens
          "("
          (exp_infix
            (exp_name
              (variable)
            )
            (operator)
            (exp_literal
              (integer)
            )
          )
          ")"
        )
      )
      (operator)
      (exp_apply
        (exp_name
          (variable)
        )
        (exp_parens
          "("
          (exp_infix
            (exp_name
              (variable)
            )
            (operator)
            (exp_literal
              (integer)
            )
          )
          ")"
        )
      )
    )
  )
)

However, in some circumstances it is parsed to the following very incorrect syntax tree:

(haskell
  (top_splice
    (exp_name
      (variable)
    )
  )
  (signature
    name: (variable)
    type: "::"
    type: (fun
      (type_name
        (type_variable)
      )
      "->"
      (type_name
        (type_variable)
      )
    )
  )
  (top_splice
    (exp_name
      (variable)
    )
  )
  (function
    name: (variable)
    "="
    rhs: (exp_name
      (variable)
    )
  )
  (function
    name: (variable)
    patterns: (patterns
      (pat_name
        (variable)
      )
      (pat_name
        (variable)
      )
    )
    "="
    rhs: (exp_name
      (variable)
    )
  )
  (top_splice
    (exp_name
      (variable)
    )
  )
  (signature
    name: (variable)
    type: "::"
    type: (fun
      (type_name
        (type)
      )
      "->"
      (type_name
        (type)
      )
    )
  )
  (top_splice
    (exp_name
      (variable)
    )
  )
  (function
    pattern: (pat_literal
      (integer)
    )
    "="
    rhs: (exp_literal
      (integer)
    )
  )
  (top_splice
    (exp_name
      (variable)
    )
  )
  (function
    pattern: (pat_literal
      (integer)
    )
    "="
    rhs: (exp_literal
      (integer)
    )
  )
  (top_splice
    (exp_name
      (variable)
    )
  )
  (function
    name: (variable)
    "="
    rhs: (exp_infix
      (exp_apply
        (exp_name
          (variable)
        )
        (exp_parens
          "("
          (exp_infix
            (exp_name
              (variable)
            )
            (operator)
            (exp_literal
              (integer)
            )
          )
          ")"
        )
      )
      (operator)
      (exp_apply
        (exp_name
          (variable)
        )
        (exp_parens
          "("
          (exp_infix
            (exp_name
              (variable)
            )
            (operator)
            (exp_literal
              (integer)
            )
          )
          ")"
        )
      )
    )
  )
)

This bug is showing up in the test suite for the Haskell support for cursorless-dev/cursorless, and @pokey and I are working to determine exactly when the bug shows up. However, even if we cannot nail down exactly when the bug shows up, it might be worthwhile ruling the second parse out entirely. Perhaps by anchoring top-level function definitions to the first column, or forcing a line-break after a top-level splice?

tek commented 5 months ago

I'm working on something for this, should be ready in a week or two

wenkokke commented 5 months ago

I'm working on something for this, should be ready in a week or two

Wow, that was a quick response. Amazing!

wenkokke commented 5 months ago

@tek Do you know why it happens?

tek commented 5 months ago

Wow, that was a quick response. Amazing!

😅 I happened to refresh the notifications page right when you posted

@tek Do you know why it happens?

Because there are many declared conflicts in the grammar, which causes tree-sitter to run two (or more) parallel attempts at parsing a sequence, one of which doesn't contain the _layout_semicolon symbol that that scanner emits when indent should force a new decl. The choice is then made based on dynamic precedence scores, which aren't well balanced in the grammar (and a bit mysterious).

wenkokke commented 5 months ago

@tek Any updates on this?

tek commented 5 months ago

still a lot of work to do to get this to release, but I've got a week off so the chances are good 😁

wenkokke commented 5 months ago

any suggestions for workarounds I could use in the meantime?

tek commented 5 months ago

other than using explicit semicolons, don't think it's that simple 😕

wenkokke commented 5 months ago

my current idea for a workaround is to fork and remove splices entirely, and use that fork in the meantime

tek commented 5 months ago

oh, you could try giving the top_splice rule a penalty:

top_splice: $ => prec.dynamic(-100, $._exp_infix),

not sure if that isn't too late in the reduction chain though.

tek commented 4 months ago

@wenkokke prerelease for you to testdrive: https://github.com/tek/tree-sitter-haskell

pokey commented 1 month ago

As of vscode-parse-tree version 0.31.0, we are now shipping with tree-sitter-haskell version https://github.com/tree-sitter/tree-sitter-haskell/commit/a50070d5bb5bd5c1281740a6102ecf1f4b0c4f19, which I believe should include the fix, but @tek lmk if I'm wrong.

Fwiw here is the parse I get for the code snippet at the top of this issue:

(haskell
  declarations: (declarations
    (signature
      name: (variable)
      "::"
      type: (function
        parameter: (variable)
        arrow: "->"
        result: (variable)
      )
    )
    (function
      name: (variable)
      patterns: (patterns
        (variable)
      )
      match: (match
        "="
        expression: (variable)
      )
    )
    (function
      name: (variable)
      patterns: (patterns
        (variable)
        (variable)
      )
      match: (match
        "="
        expression: (variable)
      )
    )
    (signature
      name: (variable)
      "::"
      type: (function
        parameter: (name)
        arrow: "->"
        result: (name)
      )
    )
    (function
      name: (variable)
      patterns: (patterns
        (literal
          (integer)
        )
      )
      match: (match
        "="
        expression: (literal
          (integer)
        )
      )
    )
    (function
      name: (variable)
      patterns: (patterns
        (literal
          (integer)
        )
      )
      match: (match
        "="
        expression: (literal
          (integer)
        )
      )
    )
    (function
      name: (variable)
      patterns: (patterns
        (variable)
      )
      match: (match
        "="
        expression: (infix
          left_operand: (apply
            function: (variable)
            argument: (parens
              "("
              expression: (infix
                left_operand: (variable)
                operator: (operator)
                right_operand: (literal
                  (integer)
                )
              )
              ")"
            )
          )
          operator: (operator)
          right_operand: (apply
            function: (variable)
            argument: (parens
              "("
              expression: (infix
                left_operand: (variable)
                operator: (operator)
                right_operand: (literal
                  (integer)
                )
              )
              ")"
            )
          )
        )
      )
    )
  )
)

@wenkokke I believe you were seeing failures only when running tests on your cursorless haskell branch? I haven't tried this new version there so can't verify whether the fix has worked

tek commented 1 month ago

yep this should be resolved