Closed JohnAD closed 4 years ago
╭──────»───────╮ ╭╶╶╶╶╶╶╶╶╶╶╶╶╶╮ ╭──────»───────╮
addExpr o──[number]─»┴─[whitespace]─┴»───[plusMinus]───»┴─[whitespace]─┴»─┬─[addExpr]─┬──o
╰╶╶╶╶╶╶╶╶╶╶╶╶╶╯ ╰─[number]──╯
This is the main rule of your grammar (generated with -d:npegGraph). Here's what is happening:
+
or -
The problem you see is with the part "if that fails". This will cause NPeg to backtrack, basically saying "Hey, that didn't work, lets try something else". If you would let NPeg collect the captures for you, it would have thrown the last one out at that point, so all would be ok. The issue is that for code block captures (copied from the manual) "the Nim code gets executed during parsing, even if the match is part of a pattern that fails and is later backtracked.".
So: at the last bit of the line, NPeg tries to find another addExpr
, it matches your number and calls your code block. Then it goes "Hey, that didn't work, backtracks to go search a number
instead, and again calls your code block.
Partly, this is a limitation of NPeg, and given the other issues that are currently open this is something for which I'd need to find a solution. The problem is not unique for NPeg, I know other Peg implementations also show the same behaviour, notably LPeg, which was the inspiration for NPegs implementation.
In idiomatic PEG, I'd rewrite your grammar like this: first match a number, and then try to match zero or more times ( plusMinus
followed by a number
)
let parser = peg("equation", p: seq[string]):
equation <- allExpr
plusMinus <- >('+' | '-'):
p.add "op:" & $1
number <- (integer)
integer <- >+Digit:
echo "got int ", $1
p.add "literal:" & $1
whitespace <- +(Blank | Space | Cntrl)
addExpr <- ?whitespace * >plusMinus * ?whitespace * number
allExpr <- number * *addExpr
This is what the graph now looks like;
╭──────»───────╮ ╭╶╶╶╶╶╶╶╶╶╶╶╶╶╮ ╭──────»───────╮
addExpr o─┴─[whitespace]─┴»───[plusMinus]───»┴─[whitespace]─┴»─[number]──o
╰╶╶╶╶╶╶╶╶╶╶╶╶╶╯
╭──────»──────╮
allExpr o──[number]─»┴┬─[addExpr]─┬┴─o
╰─────«─────╯
Non related, but just a comment: if your whitespace is always optional, move the ?
to the whitespace rule itself so you don't have to put that before every whitespace
. Explicit handling of whitespace is always a bit of a pain in the behind in PEGs, as there simpy is no lexer doing that. I usually use a very short name like s
for that so it's a bit less intrusive.
Hey @JohnAD, anything I can do for you with this, or shall I close the issue?
Feel free to close the issue. Your explanation was very useful. Thank you!
I've decided to use npeg
strictly for tokenizing the SQL statements; which is very useful. I'll then handle with "WHEN" clauses with hand-built code for interpretation.
I've decided to use
npeg
strictly for tokenizing the SQL statements; which is very useful. I'll then handle with "WHEN" clauses with hand-built code for interpretation.
Sounds like a good plan, just pick the appropriate tools for the job, and npeg often isnt :)
I'm working on a SQL parser that converts a "WHEN" expression into reverse-polish notation. I'm coming across a problem where the system is ... possibly .. failing to consume the end-point numbers.
I must admit I'm new to PEG in general. So, if I'm making a newbie mistake, I do apologize.
Rather than go into great detail, I wrote a tiny program that demonstrates it:
The result I'm getting in
answer
is:but what I'd expect is:
So, the section that is parsing a number is running twice on the same "5" text. Am I missing something or is there a bug?