lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.77k stars 404 forks source link

Problem in using parentheses in an addition #1273

Closed lalshikh closed 1 year ago

lalshikh commented 1 year ago

I'm writing a parser that needs to recognise additions with parentheses and constant functions.

Here is my grammar :

grammar_lark = u"""
start: expressions
expressions : (expression)+

expression : constant_function | argument

argument : arithmetic_operation | "..."

arithmetic_operation : term [ ("+" | "-") term ]
term : factor [ ( "*" | "/" ) factor ] 
factor : exponent [ "**" exponential ]
exponential : exponent

constant_function : identifier "(" literal ("," literal)* ")" | identifier "()"

exponent : literal | "(" argument ")"

literal : number | text

identifier : cmd_identifier | "`" /[0-9a-zA-Z\/#%\._:-]+/ "`"

cmd_identifier : CMD_IDENTIFIER
CMD_IDENTIFIER : /\\b(?!\\bexists\\b)[a-zA-Z_][a-zA-Z0-9_]*\\b/

text : TEXT
TEXT : DOUBLE_QUOTE ALPHANUM_STR DOUBLE_QUOTE
     | SINGLE_QUOTE ALPHANUM_STR SINGLE_QUOTE
ALPHANUM_STR : /[a-zA-Z0-9 ]*/
DOUBLE_QUOTE : "\\""
SINGLE_QUOTE : "'"

number : integer | float
integer : SIGNED_INT
float : SIGNED_FLOAT

WHITESPACE : /[\\t ]+/
%ignore WHITESPACE

%import common.NEWLINE
%ignore NEWLINE

%import common.SIGNED_INT
%import common.SIGNED_FLOAT
"""

And my code that performes the parsing :

json_parser = Lark(grammar_lark, parser='lalr', debug=True)
jp = json_parser.parse(input_str)
print(jp.pretty())

With this input string :

input_str = '''
4 + 5 * 2**3

A(3)'''

I get the following tree (which is what I expect) :


start
  expressions
    expression
      argument
        arithmetic_operation
          term
            factor
              exponent
                literal
                  number
                    integer 4
              None
            None
          term
            factor
              exponent
                literal
                  number
                    integer 5
              None
            factor
              exponent
                literal
                  number
                    integer 2
              exponential
                exponent
                  literal
                    number
                      integer   3
    expression
      constant_function
        identifier
          cmd_identifier    A
        literal
          number
            integer 3

But if I insert parentheses and use, for instance the following input string,

input_str = '''
4 + 5 * (2-6)**3

A(3)'''

I get the following error message :

UnexpectedToken: Unexpected token Token('SIGNED_INT', '-6') at line 2, column 11.
Expected one of: 
    * RPAR

I gess the parser expects a constant function as soon as it gets to the left parenthesis but I have no clue where is the ambiguity in my grammar.

I have to keep the argument rule as the parser will later need to recognise non constant functions.

How can I make the parser recognise properly when the parentheses are in a complex addition or in a constant function ?

erezsh commented 1 year ago

It looks to me like the error isn't because of parenthesis, but because of a collision between SIGNED_INT and the subtraction operator ("-").

Anyway, there are other issues with your grammar too. I suggest that you use the calc example as the initial grammar, and try to build on it. Or at least learn the example and understand the pattern behind it. (left recursion, etc.)

lalshikh commented 1 year ago

I rewrote my grammar following the example and it works great. Thank you!