Closed mx781 closed 9 months ago
No, you can't use numbered refrence groups since all regex get put together into one regex, which messes with the numbering. named capture groups should however work (if the name doesn't collied with the names used internally for Terminals).
However, I would strongly recommend not using capture groups (or more specifically, backrefrences). They make the Terminals non-regular (and sometimes even context sensitive), which potentially messes with the parser in that sense that it does stuff you don't expect. Almost always you are better off just using the CFG level features, i.e. the actual parser and grammar syntax.
Got it, thanks for the clarification! I thought of accomplishing this via just grammar, but couldn't find a way to encapsulate repeated tokens that way either. The repetition feature, i.e.
consonant: "q" | "x"
vowel: "u" | "o"
word: consonant vowel ~ 2..4 consonant
is in the right direction, but doesn't require the vowel
tokens to be identical, so quox
is matches as well quux
. Is there a grammar feature I've missed to accomplish something like this?
but doesn't require the vowel tokens to be identical
That would be context sensitive matching, which Lark doesn't support.
The only way to do it is using regexps.
Perhaps a context-free parser isn't the best tool for your task?
But if you do need a parser, you have the option to give Lark your own custom lexer.
got it - that clarifies things. thanks for the quick responses!
On a second thought, if you really are working with letters, which is a fixed amount of tokens, you could do something like
vowel_2to4: "a"~2..4
| "e"~2..4
| "i"~2..4
| "o"~2..4
| "u"~2..4
That should work, but as you can see it's a hack rather than a general purpose solution.
Are capturing groups not supported when using LALR?
The following works:
while this does not:
and throws
If so, is there any documentation on what regex features are supported by which parser?
Thanks for the great library!