lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.75k stars 401 forks source link

Problem with rule priority #1313

Closed jjalowie closed 1 year ago

jjalowie commented 1 year ago

What is your question?

I have the following grammar:

start: (struct_def | other)*
other: OTHER
OTHER: /[^\\n]+/
STRUCT: "struct"
struct_def.1: STRUCT CNAME

%import common.CNAME
%import common.NEWLINE

%ignore NEWLINE

Why does everything get matched as other despite setting a higher priority for struct_def?

If you're having trouble with your code or grammar

Python code reproduction:

grammar = """
start: (struct_def | other)*
other: OTHER
OTHER: /[^\\n]+/
STRUCT.1: "struct"
struct_def.1: STRUCT CNAME

%import common.CNAME
%import common.NEWLINE

%ignore NEWLINE
"""

text = """
asdf
struct qwerty
//
"""

parser = lark.Lark(grammar)
ir = parser.parse(text)
print(ir.pretty())

The above code produces the following output:

start
  other asdf
  other struct qwerty
  other //

I would expect the struct qwerty input text to be parsed as struct_def STRUCT qwerty.

Explain what you're trying to do, and what is obstructing your progress.

I'm trying to write a parse of structs for a C-like language. I want to ignore everything except for struct definitions. I don't understand why the other rule is fired before the struct_def rule.

erezsh commented 1 year ago

The problem isn't with Lark.

You forgot to account for the whitespace, so the parser failed to match struct_def.

The following grammar works as expected (there is no need to specify priority):

start: (struct_def | other)*
other: OTHER
OTHER: /[^\\n]+/
STRUCT: "struct"
struct_def: STRUCT CNAME

%import common.CNAME
%import common.WS

%ignore WS
jjalowie commented 1 year ago

Thanks!