However I found some questions about the design choices here. Namely, the Affix enum. My understanding is that the bare minimal abstraction of a Pratt parser does not include Affix:
token.lbp(): binding power used by the main parse loop to figure out when to stop
token.nud(): for literals, unary operators, groups
token.led(left: Token): for infix operators
Inside nud and led they can call parse(rbp) (expression in the linked blog) to get an Output with binding power limitation.
I think the problem of Affix is that it requires the token (Input) to tell us whether it's prefix or infix. Sometimes it does not know. I think the reason for using the (heavy weight) lalrpop in the README example is because of this. lalrpop is used to figure out whether - is prefix or infix. I would expect a slightly different design that the main parse loop (parse_input) provides the information whether it needs prefix or suffix (call nud for prefix, call led for infix) and the token is not responsible for that. I'd like to see an example that uses a simple tokenizer without using lalrpop or pest.
How do you think about the following design:
/// Minimal interface for a Pratt parser.
trait BasePrattParser<Inputs> {
type Error, Input, Output;
fn parse_input(&mut self, input: &mut Inputs, rbp: Precedence) { /* main algorithm */ }
// no "info: Affix". If these function want it, call query(&head) to get it.
fn nud(&mut self, head: &Self::Input, tail: &mut Inputs) -> Result<Output> { error("unsupported") }
fn led(&mut self, head, tail, lhs) -> Result<Output> { error("unsupported") }
fn lbp(&self, head: &Input) -> Precedence { Precedence(0) }
// no query, primary, infix, prefix, postfix, parse_peekable
}
Then a higher level implementation that introduces the Affix concept (maybe slightly differently to address the problem above).
Hi, after reading http://web.archive.org/web/20180213221021/http://effbot.org/zone/simple-top-down-parsing.htm I was looking for a Rust library. I prefer this library to prattle since the latter depends on
failure
which is no longer recommended.However I found some questions about the design choices here. Namely, the
Affix
enum. My understanding is that the bare minimal abstraction of a Pratt parser does not includeAffix
:token.lbp()
: binding power used by the main parse loop to figure out when to stoptoken.nud()
: for literals, unary operators, groupstoken.led(left: Token)
: for infix operatorsnud
andled
they can callparse(rbp)
(expression
in the linked blog) to get anOutput
with binding power limitation.I think the problem of
Affix
is that it requires the token (Input
) to tell us whether it's prefix or infix. Sometimes it does not know. I think the reason for using the (heavy weight) lalrpop in the README example is because of this. lalrpop is used to figure out whether-
is prefix or infix. I would expect a slightly different design that the main parse loop (parse_input
) provides the information whether it needs prefix or suffix (callnud
for prefix, callled
for infix) and the token is not responsible for that. I'd like to see an example that uses a simple tokenizer without using lalrpop or pest.How do you think about the following design:
Then a higher level implementation that introduces the
Affix
concept (maybe slightly differently to address the problem above).