potassco / tree-sitter-clingo

🌳 Clingo grammar for tree-sitter
MIT License
5 stars 1 forks source link

Refactor grammar #24

Open rkaminsk opened 3 days ago

rkaminsk commented 3 days ago

I have been looking at the grammar and it seems like it has been copied and adjusted directly from clingo leading to some unusual and hard to handle syntax trees.

As an example, I had a look at the const term definition. It could better be written like this (also avoiding the problematic _widentifier hack):

constterm: $ => choice(
    prec.left(7, seq($.constterm, $.XOR, $.constterm)),
    prec.left(6, seq($.constterm, $.QUESTION, $.constterm)),
    prec.left(5, seq($.constterm, $.AND, $.constterm)),
    prec.left(4, seq($.constterm, $.ADD, $.constterm)),
    prec.left(4, seq($.constterm, $.SUB, $.constterm)),
    prec.left(3, seq($.constterm, $.MUL, $.constterm)),
    prec.left(3, seq($.constterm, $.SLASH, $.constterm)),
    prec.left(3, seq($.constterm, $.MOD, $.constterm)),
    prec.right(2, seq($.constterm, $.POW, $.constterm)),
    prec.left(1, seq($.SUB, $.constterm)),
    prec.left(1, seq($.BNOT, $.constterm)),
    seq($.identifier, optional(seq($.LPAREN, optional($._consttermvec), $.RPAREN))),
    seq($.LPAREN, optional($._consttermvec_comma), $.RPAREN),
    seq($.AT, $.identifier, optional(seq($.LPAREN, optional($._consttermvec), $.RPAREN))),
    seq($.VBAR, $.constterm, $.VBAR),
    $.NUMBER,
    $.STRING,
    $.INFIMUM,
    $.SUPREMUM,
),

_consttermvec: $ => seq($.constterm, repeat(seq($.COMMA, $.constterm))),
_consttermvec_comma: $ => seq($.constterm, repeat(seq($.COMMA, $.constterm)), optional($.COMMA)),

When parsing

#const x=f(1,2,3).

We get the much cleaner tree

(source_file ; [0, 0] - [5, 0]
  (statement ; [0, 0] - [0, 18]
    (CONST) ; [0, 0] - [0, 6]
    (identifier) ; [0, 7] - [0, 8]
    (EQ) ; [0, 8] - [0, 9]
    (constterm ; [0, 9] - [0, 17]
      (identifier) ; [0, 9] - [0, 10]
      (LPAREN) ; [0, 10] - [0, 11]
      (constterm ; [0, 11] - [0, 12]
        (NUMBER ; [0, 11] - [0, 12]
          (dec))) ; [0, 11] - [0, 12]
      (COMMA) ; [0, 12] - [0, 13]
      (constterm ; [0, 13] - [0, 14]
        (NUMBER ; [0, 13] - [0, 14]
          (dec))) ; [0, 13] - [0, 14]
      (COMMA) ; [0, 14] - [0, 15]
      (constterm ; [0, 15] - [0, 16]
        (NUMBER ; [0, 15] - [0, 16]
          (dec))) ; [0, 15] - [0, 16]
      (RPAREN)) ; [0, 16] - [0, 17]
    (DOT))) ; [0, 17] - [0, 18]

Naming conventions for non-terminals should probably follow the ones in clingo. Of course this refactoring would break our downstream projects and we might want to address it in a separate branch/project. It seems very necessary, though. :smile:

rkaminsk commented 3 days ago

I'll continue refactoring the parser in https://github.com/rkaminsk/tree-sitter-clingo. It is surprisingly easy to write tree-sitter parsers. :rocket: