Open bugaevc opened 7 years ago
I currently don't understand either the problem or your solution well enough to be able to give feedback on this. The questions around the implementation, plus the use of mutability etc. make me think this is not a great idea, at least for a default. Also, it would be good to know what other similar projects do for this case.
Thinking a little bit more:
I suspect the underlying issue here is that you are doing parsing and evaluating in one step. I think for these situations (i.e. where you have some precedence rules that you need to apply), then you should be using parsy to lex the string into tokens, and then run a separate parser and evaluator that consumes the tokens. Something like this might help. http://effbot.org/zone/simple-top-down-parsing.htm
It feels like parsy is the wrong place to be building the logic to cope with these kind of issues.
The questions around the implementation, plus the use of mutability etc. make me think this is not a great idea, at least for a default.
Agreed. A proper implementation that I hope to write soon should be way more understandable, and I have an idea about how to mitigate the problem of mutability.
I currently don't understand either the problem or your solution well enough to be able to give feedback on this. <...> Also, it would be good to know what other similar projects do for this case.
The problem is that it isn't convenient to parse left-recursive grammars using parsy. The most common case of left recursion (and the one shown in the example above) is parsing left-associative operator chains. Parsec implements a different, perhaps less magical, solution in the form of:
chainl1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a
Now, I have a hard time parsing that, but IIUC it works like this, in parsy-like syntax:
sign = string('+').result(operator.add) | string('-').result(operator.sub)
expr = num.chained_by(sign)
This may be a bit more difficult to comprehend (higher-order functions and stuff) and it only works for that simple case; still I think it's a brilliant solution and we should implement that.
My "magical" solution is more generic and easier to use, but harder to reason about. Perhaps we can have both, the magical one being an explicit wrapper for "advanced" use-cases.
I suspect the underlying issue here is that you are doing parsing and evaluating in one step. I think for these situations (i.e. where you have some precedence rules that you need to apply), then you should be using parsy to lex the string into tokens, and then run a separate parser and evaluator that consumes the tokens. Something like this might help. http://effbot.org/zone/simple-top-down-parsing.htm
I don't think that's the problem. Right now, parsy is a great framework for creating lexers; it was possible to create parsers with it before, too. In fact, I was thinking we'd build a more robust and high-level infrastructure for these things, similar to
infix("+", 10); infix("-", 10)
infix("*", 20); infix("/", 20)
infix_r("**", 30)
prefix("+", 100); prefix("-", 100)
symbol("(literal)").nud = lambda self: self
symbol("(end)")
code from the article you've linked, and left recursion elimination (in one form or another) is the first step to that. Ideally, I'd imagine something like:
import parsy.lexems as pl
from parsy.op import infix, priority_group
expr = parsy.Placeholder()
ops = parsy.operators(
priority_group(
infix(pl.plus),
infix(pl.minus)
),
priority_group(
infix(pl.asterisk),
infix(pl.slash)
),
)
expr2 = (
pl.num |
pl.lparen >> expr << pl.rparen |
ops
)
expr.set(expr2)
@bugaevc Thanks for your explanation. I've done a bit more Googling and understand the issue with left-recursive grammars better now. I found the same solution as you did.
I also found, as I think you were implying, that there are some situations it doesn't work for e.g. for postfix left recursion, also with some solutions - https://www.reddit.com/r/haskelltil/comments/3ocukk/a_function_postfixchain_to_parse_a_left_recursive/ and http://www.joelwilliamson.ca/programming/parsing/haskell/2015/02/04/Parsec-Left-Recursion.html
I think the best approach at the moment is to implement something like chainl1
(with a better name!), possibly something for the postfix case, and have some cookbook style docs on left recursive grammars. If you wanted to contribute a version of your automagical implementation, we could add it as an explicit wrapper as you suggest, and mark it as experimental (I've haven't yet had the chance to get into the implementation, and I don't want to commit to maintaining something I don't understand!)
A further step would be adding the higher level constructs you talk about, which I'm definitely open to, if they are generic enough.
Dropping it here in case someone encounters the same issue:
# Python translation of parsec's chainl1:
# https://hackage.haskell.org/package/parsec-3.1.15.1/docs/Text-Parsec-Combinator.html#v:chainl1
def parse_left_associative_expr(op, operand):
def rest(x):
@generate
def f():
f = yield op
y = yield operand
return rest(f(x, y))
return f | success(x)
@generate
def f():
x = yield operand
return rest(x)
return f
EDIT: removed custom const()
and replaced by parsy's success()
Sometimes, it would be very convenient to implement a parser using left recursion, like in this example:
Note that this cannot be easily rewritten to use
number +/- expr
, because we need to parse1 - 2 - 3
as(1 - 2) - 3
and not1 - (2 - 3)
.This currently doesn't work because calling
expr
inside ofexpr
causes infinite recursion. The proper way to implement this is to unwrap the recursion manually, i.e. to use an explicit loop that consumes and adds/subtracts a sequence of numbers. However, this is much less convenient than implementing 'the mental model' directly.This RFC proposes for parsy to be able to do such a transformation automatically.
A proof-of-concept (barely readable) implementation is here, and here's an explanation in pseudocode (that skips over many small details):
This works as if the recursion magically failed at just the right depth, giving the alternative sub-parsers (
number
in the example above) a chance to try their logic.Unresolved questions:
parsy.recursion_catcher()
,Parser.left_recursive()
or what?