Closed ymeiron closed 2 years ago
Balancing parenthesis requires a parser, this is something a normal lexer can't do.
When using LALR, %ignore
respects priority:
>>> l = Lark('''start: "a" WS "b"
WS.10: " "
%ignore " "
''', parser='lalr')
>>> l.parse('a b')
Tree('start', [Token('WS', ' ')])
When using Earley, priority isn't even required:
>>> l = Lark('''start: "a" WS "b"
WS: " "
%ignore " "
''')
>>> l.parse('a b')
Tree('start', [Token('WS', ' ')])
So I think it's worth trying to just parse it using the Lark parser, i.e. by specifying the structure in the grammar.
But if that doesn't work, the alternative is to implement your own lexer, that will handle this special case, and yield to the regular Lark parser for the rest.
Thanks! Changing to LALR and setting the WS priority to negative one, did the trick. My grammar is now:
start: _statement*
_statement: func_statement | hello_statement
func_statement: "FUNC(" func_content ")"
func_content: parens_content ("(" func_content ")")? parens_content
parens_content: ANYTHING_IN_PARENS*
ANYTHING_IN_PARENS: /[^\)]/ | "()"
hello_statement: "HELLO(" [/\w+/] ")"
WS.-1: /[ \t\f\r\n]/+
%ignore WS
(I also realized since this morning that empty parentheses caused an error, but adjusting ANYTHING_IN_PARENS
as above fixed that issue as well)
What is your question?
What is the best way to verbatim reproduce content inside parentheses? That content includes more parentheses (guaranteed balanced) and whitespaces. Whitespaces should otherwise be ignored in the grammar.
Provide a small script that encapsulates your issue.
Explain what you're trying to do, and what is obstructing your progress.
For the example above, I want to obtain the string
prefix(in(ter)nal)suffix end
; my program almost works, it doesn't capture the space betweensuffix
andend
. It looks like%ignore
is quite strong and takes precedence. I can get it to work if I write the whitespaces into the rules, but that makes the rest of the grammar (outside the minimal example) quite difficult. Another not-so-elegant option is to preprocess the input before feeding it to Lark. Any takes on this?