lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.62k stars 395 forks source link

Pipe in terminal regex not working as expected #1414

Open sidhiadkoli opened 1 month ago

sidhiadkoli commented 1 month ago

What is your question?

Facing an issue with a pipe in terminal regex.

Here is a subset of the grammar in question:

from lark import Lark

grammar = """
start: START

START: QUARTER [WS+ YEAR]
QUARTER: /q[1-4]/
WS: /\s/
YEAR: /(19[0-9]{2})|(20[0-3][0-9])/
"""

print(Lark(grammar).parse("q1 1923"))    # works
print(Lark(grammar).parse("q1 2023"))    # doesn't work

However, when we add parenthesis around the full YEAR regex, both the string examples get parsed correctly.

This works:

from lark import Lark

grammar = """
start: START

START: QUARTER [WS+ YEAR]
QUARTER: /q[1-4]/
WS: /\s/
YEAR: /((19[0-9]{2})|(20[0-3][0-9]))/
"""

print(Lark(grammar).parse("q1 1923"))    # works
print(Lark(grammar).parse("q1 2023"))    # works now

What am I missing here?

MegaIng commented 1 month ago

This is a bug in the way lark combines terminals, #1415 fixes it.