tokay-lang / tokay

Tokay is a programming language designed for ad-hoc parsing, inspired by awk.
https://tokay.dev
MIT License
236 stars 7 forks source link

Implement indirect left-recursion #95

Open phorward opened 1 year ago

phorward commented 1 year ago

For indirect left-recursion in packrat parsing, one rule in the grammar's graph must be declared as "leading", so that subsequent, even left-recursive parselets are considered as not left-recursive.

An implementation of an solution for this issue can be found in the pegen parser generator from Python.

Tokay currently cannot handle this issue properly, and fails to parse abbcb with the grammar

X: Y 'c'
Y: Z 'b'
Z: X | Y | 'a'
Z

only the a is parsed.

phorward commented 9 months ago

During #105, this problem becomes more present, as the new Opt/Pos/Kle-concept introduces more parselets which might introduce indirect left-recursions.

As provided in tests/parselet_leftrec.tok

# direct
D1: @{
    D1 Char<b>
    Char<a>
}
'D1' print(D1)

# indirect 1: currently not working, see issue #95 for details
I1: @{
    I1? Char<a>
}
'I1' print(I1)

# indirect 2: currently not working, see issue #95 for details
X: Y 'c'
Y: Z 'b'
Z: X | Y | 'a'
'I2' print(Z)

#---
#D1abbb
#I1aaaa
#I2abbcb
#---
#((("a", "b"), "b"), "b")
#a
#a

I1 and I2 do not work as expected, as they introduce indirect left-recursions. I1 can be resolved by #120, but this does not solve the problem directly. Maybe, a refactor of the meaning behind Op::ForwardIfConsumed is also a way to resolve this problem, or parts of it, more precisely.