Closed Patitotective closed 1 year ago
Actually this is a much better example of my problem:
import npeg
# This parser parses words and no words
# Words are one or more alpha characters
# And no words are words with a dash after them
const parser = peg("nodes", data: seq[string]):
nodes <- *(node * (' ' | !1))
node <- noWord | word
noWord <- word * dash
word <- +Alpha:
data.add($0)
dash <- '-':
data.add($0)
var data: seq[string]
assert parser.match("a b-", data).ok
assert data == @["a", "a", "b", "-"]
I want the word
's capture block to be executed from noWord
only if noWord
matches successfully.
Hm, I see what you're trying to do, but I'm not sure it is a good idea to solve it with yet another operator. As you have seen in the manual code block captures are a bit of a PITA because they always match, even if they are part of a backtracked pattern.
One solution would be to be explicit about word
and notWord
having the trailing dash or not by using the !
operator, like so:
body <- +Alpha
word <- body * !'-'
noWord <- body * '-'
body
is a simple pattern matching a string of alpha characters, word
will match if and only if the body
is not followed by a -
, while noWord
will only match if body
is followed by a -
You can incorporate this into your example like this:
const parser = peg("nodes", data: seq[string]):
nodes <- *(node * (' ' | !1))
node <- noWord | word
body <- +Alpha
word <- >body * !'-':
data.add("word " & $1)
noWord <- >body * '-':
data.add("noword " & $1)
var data: seq[string]
assert parser.match("a b- c", data).ok
assert data == @["word a", "noword b", "word c"]
Does this solve your problem?
My actual peg is a little more complicated, it is meant to be a lexer and it adds tokens to a stack whenever it finds them.
So my word
pattern in there, has more patterns inside that would match (and execute the capture code block) before I can check that it is a noWord
(!'-'
).
I'll try to explain my actual use case: I'm trying to implement the KDL document language in Nim, it's syntax is pretty straightforward, it goes like this:
# node val key=val val1 key1=val1 val2 # Properties and arguments
node "Hello" "name"="zevv" 1 true age=20
Therefore in my lexerPeg
I have (prop | value)
because properties and values can be interspersed by spaces all over the node.
The issue is that prop
has the strOrIdent
pattern inside that matches an identifier (without quotes) or a string, and the capture code block of the string is called before it checks whether there is a '='
after it or not.
And adding the tokens to the stack when prop
matches instead of it's sub-patterns, makes it a little harder because I would need to parse value
(remember a property is a key=val
) again to know which kind is it (I could perhaps parse the values and set them to lexer.currentValueToken
that is then used on other patterns (like prop or node)).
This could be an example of my more complex peg:
import npeg
const parser = peg("nodes", data: seq[string]):
nodes <- *(node * (' ' | !1))
node <- extraWord | word
extraWord <- word * extra * dash
word <- +Alpha:
data.add($0)
dash <- '-':
data.add($0)
extra <- number | dot
number <- +Digit:
data.add($0 & "(int)")
dot <- '.':
data.add($0 & "(dot)")
var data: seq[string]
assert parser.match("a b1- c.", data).ok
assert data == @["a", "a", "b", "1(int)", "-", "c", ".(dot)", "c"]
Hmm, your real grammar is quite big already, it's going to cost me some time to properly get into that, so I'll just be ignorant and look at your smaller examples for now...
I think that the general idea with code block captures is that you should run these as late as possible - that is, when you are sure you have a proper match that will not backtrack. In the mean time you can collect everything you need in regular captures in the nested rules, and access these in your code block using $1
.. $9
. Alternatively you could pass around some local state to store things yourself you need later. So don't run a code block capture for your word
, just make it a normal string capture so it will be available later when you have decided it is either a prop or a value; then in your prop
and value
rules, if will be available for you in one of the $
variables.
Can you give the asser()
of what you would like as the result for your last example's match?
Thanks for the suggestions. I will try what you're saying, I also think adding tokens independently without validating them is not good.
I expect it to be assert data == @["a", "b", "1(int)", "-", "c"]
and then fail because .
shouldn't match any pattern.
Closing this for inactivity, feel free to reopen if appropriate.
I want to make a pattern that uses another pattern but doesn't execute that pattern's code block capture.
There could be, perhaps, an operator that allowed it: