Open LPeter1997 opened 4 years ago
Perhaps an interesting/helpful finding: Introducing an "alias" rule for RuleB
seems to successfully parse x.x$
. The grammar:
RuleA ::= RuleB.
RuleB ::= IDENTIFIER
| RuleC '.' IDENTIFIER
.
RuleB' ::= IDENTIFIER
| RuleC '.' IDENTIFIER
.
RuleC ::= RuleB'
| RuleA
.
Implementation:
lazy val ruleA: PackratParser[String] = ruleB <~ "$"
lazy val ruleB: PackratParser[String] = (ident
||| ruleC ~> "." ~> ident)
lazy val ruleBvar: PackratParser[String] = (ident
||| ruleC ~> "." ~> ident)
lazy val ruleC: PackratParser[String] = (ruleBvar
||| ruleA)
However, it still rejects a valid input, x.x.x$
.
This makes me further believe that the problem is with indirect left-recursion.
I think the problem is the one identified in this paper: https://tratt.net/laurie/research/pubs/html/tratt__direct_left_recursive_parsing_expression_grammars/
That talks about an interaction with left- and right-hand recursion and identifies a problem with the Warth et al algorithm used in the PackratParsers.
I've isolated a 3-rule pattern, where I'd expect the parser to succeed for a given input, but it fails instead. The sample grammar:
The implementation:
It fails for input
x.x$
, telling me that it expects a$
instead of the.
.For me, this seems to be a problem with how indirect left-recursion is handled. I'm not sure if the original algorithm is incapable of handling this pattern, or this is an implementation bug.
Edit: I've accidentally used
|
(first matching alt.) instead of|||
(longest matching alt.), I've fixed that in the code, but doesn't change the outcome.