Open Geal opened 7 years ago
Not sure if this answer is entirely relevant (as I use it, expr
does not use res
, and I'm not certain it would work if that were the case), but here is how I do left recursion:
pub fn binop_precedence_n<'a>(i: &'a str) -> IResult<&'a str, Expression<'a>> {
do_parse!(i,
first: binop_precedence_n_plus_1 >>
fold: fold_many0!(
do_parse!(
tag_s!("OP") >> // or whatever parser
expr: binop_precedence_n_plus_1 >>
(expr)
),
first,
|acc: Expression<'a>, item: Expression<'a>| {
Expression::BinOpPrecedenceN(BinOpPrecedenceN{
left: Box::new(acc),
right: Box::new(item),
})
}
) >>
(fold)
)
}
Knowing that I have an Expression
enum with all possible expression types in it.
I talked with @Geal briefly about reviving this discussion on irc.
Currently, I have implemented the algorithm described in [1] with pvm
.
There is also another reference implementation of [1] with IronMeta
. I have only briefly looked at there source code so I can not speak at length about there implementation.
The basic idea behind the algorithm is that a parser must halt or fail because it is handling a finite amount of input. Thus, if bounded recursive calls are strictly increasing (in terms of input consumed), you allow for the number of "recurses" to increase until the amount of consumed input is less than the prior trial. For PEGs it is in fact the case that recursive calls are strictly increasing. It is my belief that nom
will also share this property.
Ultimately, this means that every call that is part of a left recursive cycle must do the bureaucracy of bounded recursion. This means that we'll have to backtrack to appropriate positions in the subject (the data we're trying to parse). If nom
doesn't have random access iterators under the hood that may be problematic. This does not mean we'll have to re-parse the text when doing the bounded recursion, you memoize the prior result and "start from there" instead.
Geal mentioned possibly wrapping nom
in a new library to add this functionality. It is not clear to me that this is possible at the moment but I want to mention it so that it's not immediately ruled out. I don't have a strong understanding of the inner workings of nom
(just cursory glances at the source here and there) so I can't say for sure.
I will be busy until August 1st, so I can't yet fully investigate this, right now i'm just going on past knowledge working with pvm
. After that date I intend to make this my main project (along with sprucing up pvm
a bit). I would like to see nom
with this functionality if possible as, many moons ago, I wanted to use it to write quick and simple parsers found in programming language literature (for experimentation sake). My frustration with no crate in the Rust eco system for parsing supporting left recursion is what spawned pvm
. However, I want to spread the joy (and pvm
has it's own interesting considerations as a potentially runtime-changing parser).
What became of this?
FWIW, there is a crate called nom_recursive that adds a proc macro that aims to help support left recursion, but I think it has a pretty serious limitation.
After having a look at nom_recursive
and deciding against it because of that limitation, I ended up implementing my own solution on top of nom_locate to get custom parser state. It is an implementation of a variation of the packrat algorithm as found here:
https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1
The state tracking I have is fairly simple:
State
struct containing a member for each left recursive ruleState
members contains the Action
the rule will do when next tried. It can be either Seed
(initial parse), Fail
(to prevent infinite recursion) and Succeed(T)
(succeed and return a clone of the pre-parsed T
)Vec<State>
to track state for each input position.So far this seems to have functionally worked well, but:
Vec<State>
can be huge if the input is big, which is very wasteful and therefore can really tank performance. I found that it is capital to only have state tracking for recursive rules, and to use Succeed(Box<T>)
to reduce the size of the state. Just the Box
trick alone allowed for a 5x speedup in debug builds.State
struct. The internals are not pretty and it probably limits the amount of clever parser tricks you can implement.Fortunately, the input I have to deal with is quite small. Also being able to let the grammar be left recursive was important to me (rather than e.g. parse a flat expression syntax and turn it back into an AST using a precedence table), so I don't think I would have been better served by any other lib like pest
nom can recognize left recursive grammars, but at the price of losing some information. Something like
res <- expr OP expr
withexpr
possibly usingres
can be recognized, but left associativity must be infered from the AST.Here are two papers about supporting left recursive grammars: http://richard.myweb.cs.uwindsor.ca/PUBLICATIONS/PADL_08.pdf https://arxiv.org/pdf/1207.0443.pdf?ref=driverlayer.com/web
How could those be implemented, at what cost?