lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.75k stars 401 forks source link

lark.exceptions.VisitError: Error trying to process rule "expansion": #1295

Closed zdebel closed 10 months ago

zdebel commented 1 year ago

I've got an exception like in the title, I can't find where I went wrong, attached is the grammar file, I'm running lark using a simple parse call:

tree = verilog_parser.parse(open("test_data/verilog/inverter.v").read())

//used https://www.verilog.com/VerilogBNF.html as reference, this is Verilog 1995

//1. Source Text

//<source_text>
start: description*

//<description>
description: module

//<module>
module: "module" name_of_module list_of_ports? ";" module_item* "endmodule"

//<name of module>
name_of_module: IDENTIFIER

//<list_of_ports>
list_of_ports: "(" port ("," port)* ")"

//<port> simplified for the moment, not according to ref
port: IDENTIFIER

//<module_item>
module_item: input_declaration
    | output_declaration
    | inout_declaration
    | continuous_assign

//2. Declarations

//<input_declaration>
input_declaration: "input" range? list_of_variables ";"

//<output_declaration>
output_declaration: "output" range? list_of_variables ";"

//<inout_declaration>
inout_declaration: "inout" range? list_of_variables ";"

//<list_of_variables>
list_of_variables: NETTYPE drive_strength? expandrange? delay? list_of_assignments ";"

//<NETTYPE>
NETTYPE: "wire"
    | "tri"
    | "tri1"
    | "supply0"
    | "wand"
    | "triand"
    | "tri0"
    | "supply1"
    | "wor"
    | "trior"
    | "trireg"

//<expandrange>
expandrange: range
    | "scalared" range
    | "vectored" range

//<continuous_assign>
continuous_assign: "assign" drive_strength? delay? list_of_assignments ";"

//<drive_strength>
drive_strength: "(" STRENGTH0 "," STRENGTH1 ")"
    | "(" STRENGTH1 "," STRENGTH0 ")"

//<STRENGTH>
STRENGTH0: "supply0"
    | "strong0"
    | "pull0"
    | "weak0"
    | "highz0"
    |
//<STRENGTH1>
STRENGTH1: "supply1"
    | "strong1"
    | "pull1"
    | "weak1"
    | "highz1"

//<range>
range: "[" constant_expression ":" constant_expression "]"

//<list_of_assignments>
list_of_assignments: assignment ("," assignment)*

//5. Behavioral statements

//<assignment>
assignment: lvalue "=" expression

//7. Expressions

//<lvalue>
lvalue: identifier

//<constant_expression>
constant_expression: expression

//<expression>
expression: primary

//<primary>
primary: number

//<number>
number: DECIMAL_NUMBER

//<DECIMAL_NUMBER>
DECIMAL_NUMBER: "0".."9" | "_"

//8. General

//<identifier>
identifier: IDENTIFIER ("." IDENTIFIER)*

//<delay>
delay: number

//LARK
IDENTIFIER: CNAME
    | ESCAPED_IDENTIFIER
ESCAPED_IDENTIFIER: /\\([^\s]+)/  
COMMENT: "//" /[^\n]*/ NEWLINE
NEWLINE: "\n"
MULTILINE_COMMENT: /\/\*(\*(?!\/)|[^*])*\*\//        

%import common.CNAME
%import common.ESCAPED_STRING
%import common.SIGNED_NUMBER
%import common.DECI
%import common.WS

%ignore WS
%ignore COMMENT
%ignore MULTILINE_COMMENT
%ignore NEWLINE
MegaIng commented 1 year ago

Can you provide a full example script that includes both a somewhat minimal grammar (just try deleting stuff as long as the error doesn't go away) as well as the input that causes the issue and the full error message?

zdebel commented 1 year ago

Stupid me, it obviously fails before I try to parse something:

verilog_grammar = './grammars/verilog.lark'

verilog_parser = lark.Lark.open(verilog_grammar)

The grammar that worked (as in, didn't throw an internal exception) before the current one in the top post:

//used https://www.verilog.com/VerilogBNF.html as reference, this is Verilog 1995

//1. Source Text

//<source_text>
start: description*

//<description>
description: module

//<module>
module: "module" name_of_module list_of_ports? ";" module_item* "endmodule"

//<name of module>
name_of_module: IDENTIFIER

//<list_of_ports>
list_of_ports: "(" port ("," port)* ")"

//<port> simplified for the moment, not according to ref
port: IDENTIFIER

//<module_item>
module_item: input_declaration
    | output_declaration
    | inout_declaration
    | continuous_assign

//2. Declarations

//<input_declaration>
input_declaration: "input" range? list_of_variables ";"

//<output_declaration>
output_declaration: "output" range? list_of_variables ";"

//<inout_declaration>
inout_declaration: "inout" range? list_of_variables ";"

//<list_of_variables>
list_of_variables: NETTYPE drive_strength? expandrange? delay? list_of_assignments ";"

//<NETTYPE>
NETTYPE: "wire"
    | "tri"
    | "tri1"
    | "supply0"
    | "wand"
    | "triand"
    | "tri0"
    | "supply1"
    | "wor"
    | "trior"
    | "trireg"

//<expandrange>
expandrange: range
    | "scalared" range
    | "vectored" range

//<continuous_assign>
continuous_assign: "assign" drive_strength? delay? list_of_assignments ";"

//<drive_strength>
drive_strength: "(" STRENGTH0 "," STRENGTH1 ")"
    | "(" STRENGTH1 "," STRENGTH0 ")"

//<STRENGTH>
STRENGTH0: "supply0"
    | "strong0"
    | "pull0"
    | "weak0"
    | "highz0"
    |
//<STRENGTH1>
STRENGTH1: "supply1"
    | "strong1"
    | "pull1"
    | "weak1"
    | "highz1"

//<range>
range: "[" constant_expression ":" constant_expression "]"

//<list_of_assignments>
list_of_assignments: assignment ("," assignment)*

//7. Expressions

//<constant_expression>
constant_expression: expression

//<expression>
expression: primary

//<primary>
primary: SIGNED_NUMBER

//LARK
IDENTIFIER: CNAME
    | ESCAPED_IDENTIFIER
ESCAPED_IDENTIFIER: /\\([^\s]+)/  
COMMENT: "//" /[^\n]*/ NEWLINE
NEWLINE: "\n"
MULTILINE_COMMENT: /\/\*(\*(?!\/)|[^*])*\*\//        

%import common.CNAME
%import common.ESCAPED_STRING
%import common.SIGNED_NUMBER
%import common.WS

%ignore WS
%ignore COMMENT
%ignore MULTILINE_COMMENT
%ignore NEWLINE
zdebel commented 1 year ago

Full error of the not working grammar:

Traceback (most recent call last):
  File "/home/zdebel/.local/lib/python3.11/site-packages/lark/visitors.py", line 124, in _call_userfunc
    return f(children)
           ^^^^^^^^^^^
  File "/home/zdebel/.local/lib/python3.11/site-packages/lark/load_grammar.py", line 621, in expansion
    assert items
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zdebel/kodasy/AwesomeHDLSimByzD/main.py", line 5, in <module>
    verilog_parser = lark.Lark.open(verilog_grammar)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zdebel/.local/lib/python3.11/site-packages/lark/lark.py", line 567, in open
    return cls(f, **options)
           ^^^^^^^^^^^^^^^^^
  File "/home/zdebel/.local/lib/python3.11/site-packages/lark/lark.py", line 399, in __init__
    self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start, terminals_to_keep)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zdebel/.local/lib/python3.11/site-packages/lark/load_grammar.py", line 706, in compile
    terminals = [TerminalDef(name, transformer.transform(term_tree), priority)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zdebel/.local/lib/python3.11/site-packages/lark/load_grammar.py", line 706, in <listcomp>
    terminals = [TerminalDef(name, transformer.transform(term_tree), priority)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zdebel/.local/lib/python3.11/site-packages/lark/visitors.py", line 260, in transform
    tree = t.transform(tree)
           ^^^^^^^^^^^^^^^^^
  File "/home/zdebel/.local/lib/python3.11/site-packages/lark/visitors.py", line 314, in transform
    res = self._call_userfunc(x, args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zdebel/.local/lib/python3.11/site-packages/lark/visitors.py", line 128, in _call_userfunc
    raise VisitError(tree.data, tree, e)
lark.exceptions.VisitError: Error trying to process rule "expansion":
MegaIng commented 1 year ago

The problem is that in STRENGTH0 one side of an | is empty. This isn't really supported inside of terminals since terminals can never be zero length. Instead make the terminal itself optional with a ? or rework what it does to always match something.

@erezsh Is there an inherent reason why we have an assert items in there? Instead using if not items: return PatternStr('') seems to work without problems (well, unless it the lexer analysis later says "this is zero length"). but part of a terminal can be zero length without the entire terminal being zero length.

zdebel commented 1 year ago

Ah... Stupid me, thank you for the catch mate! The exception made me think it wasn't my fault (highly doubtful) so I posted this as a question not a bug :)

MegaIng commented 1 year ago

Well, it was your fault in the sense that it's something you can fix. But this error message should be way better.

zdebel commented 1 year ago

Yeah, we can close it now, thanks :)

erezsh commented 1 year ago

@MegaIng I think it's just an oversight on my part. I definitely agree the error should be better, and sounds like what you suggest will do the trick.

Luis-omega commented 10 months ago

I spend 2 hours today debugging this error in a grammar wrote by someone learning lark. So, now, I'm really interested in a fix for this.

I would be happy forbidding them at syntax level :

%ignore |
A: | "a"
B: "b" |
C:  "c" | | "d"
D: [ ]
E: ( )

But I think that is a major work since the rule that allows them is https://github.com/lark-parser/lark/blob/ba5ae311569b3ede4f712808e6dfdfe5162461c2/lark/load_grammar.py#L131C3-L131C43

'_expansion': ['', '_expansion expr'],

And is used to introduce support for empty rules. So, to forbid this to happen inside terminals, a new rule only for terminals is needed and it would modify all the transformations in load_grammar.py

erezsh commented 10 months ago

This issue is now solved in master.

@Luis-omega If you think the solution is insufficient, feel free to explain it here or in a new issue.