Closed katomuso closed 2 months ago
There are a couple ways to work around this, but I assume this is simply an example so it may not be what you are looking for. And there is the obvious option of simply extracting all of the words and simply using (get words 1)
to get the second word.
However, the main idea is to simply have two kinds of words - captured words and non capture words.
(def str "5:apple6:banana6:cherry")
(def grammar
~{:word (lenprefix (* '(number :d+) ":") :w)
:main (* :word ':word (any :word))})
(def grammar-compiled (peg/compile grammar))
(pp (peg/match grammar-compiled str))
To get the only the second word, we capture a single throw-away word, the word we actually want, and then any number of subsequent words (or we could ignore them).
Does this work for your use case? Getting the nth capture is, to me, not a natural operation in a parser, so if you want to do it you need to use the cmt
special which is a sort of escape hatch for arbitrary computation during parsing.
EDIT: misread the problem, I assumed we were trying to get the second word "banana", not discard the prefix
So reading a bit more into the example, I guess there is a little more here. Assuming we are constraining ourselves with the following:
I suppose and nth
combinator could do this, but I think another mechanism here would be something like backref
and drop
combined into one rule. Basically, get a tagged capture and drop everything else. I think that might be easier to use than need to keep track of capture indices.
Works like a charm, thanks!
Take a look at the following example, where in each
(* :prefix ":" :word)
I have two captures: number designating the following string length (:prefix
) and the string itself (:word
):The problem is that I want to get only the strings themselves, not their prefixes. Currently, to do that I need to use
(cmt ... ,|$1)
which is not that convenient, as to understand it I need to jump to the very end to find out which capture I want to get and then back to the beginning:It would be nicer if there were a special like
(nth n patt)
, so the previous example would look like this instead: