lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.64k stars 397 forks source link

AssertionError when using templates #1382

Open subnut opened 6 months ago

subnut commented 6 months ago

Describe the bug

AssertionError

$ python parser.py
Traceback (most recent call last):
  File "/home/user/Projects/project/parser.py", line 5, in <module>
    parser = Lark.open("grammar.lark")
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.config/nvim/venv/lib/python3.12/site-packages/lark/lark.py", line 580, in open
    return cls(f, **options)
           ^^^^^^^^^^^^^^^^^
  File "/home/user/.config/nvim/venv/lib/python3.12/site-packages/lark/lark.py", line 410, in __init__
    self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start, terminals_to_keep)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.config/nvim/venv/lib/python3.12/site-packages/lark/load_grammar.py", line 710, in compile
    terminals = [TerminalDef(name, transformer.transform(term_tree), priority)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.config/nvim/venv/lib/python3.12/site-packages/lark/lexer.py", line 126, in __init__
    assert isinstance(pattern, Pattern), pattern
AssertionError: Tree('template_usage', [NonTerminal('_string'), "'"])

To Reproduce

This grammar fails with AssertionError

start: _strconst+
_strconst: STRCONST_SQ
         | STRCONST_BQ
         | STRCONST_DQ
STRCONST_SQ: _string{"'"}
STRCONST_BQ: _string{"`"}
STRCONST_DQ: _string{"\""}
_string{quot}: quot /.*?/s /(?<!\\)(\\\\)*?/ quot

whereas the same grammar, when expanded, works perfectly fine!

start: _strconst+
_strconst: STRCONST_SQ
         | STRCONST_BQ
         | STRCONST_DQ
STRCONST_SQ: "'"   /.*?/s /(?<!\\)(\\\\)*?/   "'"
STRCONST_BQ: "`"   /.*?/s /(?<!\\)(\\\\)*?/   "`"
STRCONST_DQ: "\""  /.*?/s /(?<!\\)(\\\\)*?/   "\""
MegaIng commented 6 months ago

_string is a rule template, that can't be used in a terminal. Currently there are no terminal templates. (and the expansion you wrote is not in fact equivalent. rule template generate new rules with impossible-to-recreate names instead of inlining)

erezsh commented 6 months ago

Thanks for letting us know. This is indeed an incorrect grammar, but we should throw a better error message for the user.

subnut commented 6 months ago

_string is a rule template, that can't be used in a terminal. Currently there are no terminal templates.

Then that should be mentioned at https://lark-parser.readthedocs.io/en/stable/grammar.html#templates

MegaIng commented 6 months ago

Yep, that documentation is wrong: templates aren't "expanded", they are instantiated. We had considered adding expanding templates (probably with __ prefix), but haven't gotten around to that.