tree-sitter / tree-sitter-verilog

SystemVerilog grammar for tree-sitter
MIT License
93 stars 36 forks source link

Parsing error on range of struct field #43

Open zegervdv opened 4 years ago

zegervdv commented 4 years ago

I'm getting a parsing error when trying to parse this construct:

module foobar #(
) (
    input  logic                      i_clk,
    input  logic                      i_rst
);

  always_comb begin : label
    if (bar.baz[15:6] >= boo) begin
      foo = 1'b0;
    end else begin
      foo = 1'b1;
    end
  end : label

endmodule

Parsing result:

    module_or_generate_item [2, 6] - [7, 11])
      always_construct [2, 6] - [7, 11])
        always_keyword [2, 6] - [13, 6])
        statement [14, 6] - [7, 11])
          statement_item [14, 6] - [7, 11])
            seq_block [14, 6] - [7, 11])
              simple_identifier [22, 6] - [27, 6])
              ERROR [4, 7] - [29, 7])
                constant_function_call [8, 7] - [15, 7])
                  function_subroutine_call [8, 7] - [15, 7])
                    subroutine_call [8, 7] - [15, 7])
                      method_call [8, 7] - [15, 7])
                        primary [8, 7] - [11, 7])
                          simple_identifier [8, 7] - [11, 7])
                        method_call_body [12, 7] - [15, 7])
                          method_identifier [12, 7] - [15, 7])
                            method_identifier [12, 7] - [15, 7])
                              simple_identifier [12, 7] - [15, 7])
                data_type_or_implicit1 [15, 7] - [21, 7])
                  implicit_data_type1 [15, 7] - [21, 7])
                    packed_dimension [15, 7] - [21, 7])
                      constant_range [16, 7] - [20, 7])
                        constant_expression [16, 7] - [18, 7])
                          constant_primary [16, 7] - [18, 7])
                            primary_literal [16, 7] - [18, 7])
                              integral_number [16, 7] - [18, 7])
                                decimal_number [16, 7] - [18, 7])
                                  unsigned_number [16, 7] - [18, 7])
                        constant_expression [19, 7] - [20, 7])
                          constant_primary [19, 7] - [20, 7])
                            primary_literal [19, 7] - [20, 7])
                              integral_number [19, 7] - [20, 7])
                                decimal_number [19, 7] - [20, 7])
                                  unsigned_number [19, 7] - [20, 7])
                ERROR [22, 7] - [24, 7])
                simple_identifier [25, 7] - [28, 7])

The error seems to be located at the bar.baz[15:6] part. Removing either the .baz or [15:6] resolves the error.

drom commented 4 years ago

Yes, that is the bug in the grammar. PR is welcome

zegervdv commented 4 years ago

I've been trying to figure it out, but my understanding of tree-sitter / parsing in general is too limited.

I believe the root issue here is that the foo.bar inside the cond_predicate is being parsed as a function_subroutine_call.

You can see this when parsing if (foo.bar > baz):

conditional_statement [7, 4] - [11, 7]
  cond_predicate [7, 8] - [7, 22]
    expression [7, 8] - [7, 22]
      expression [7, 8] - [7, 15]
        primary [7, 8] - [7, 15]
          function_subroutine_call [7, 8] - [7, 15]
            subroutine_call [7, 8] - [7, 15]
              method_call [7, 8] - [7, 15]
                primary [7, 8] - [7, 11]
                  simple_identifier [7, 8] - [7, 11]
                method_call_body [7, 12] - [7, 15]
                  method_identifier [7, 12] - [7, 15]
                    method_identifier [7, 12] - [7, 15]
                      simple_identifier [7, 12] - [7, 15]

When it then encounters the [15:6] it fails to continue.

Note that without the [15:6] it does not error anymore, but it still parses as a function_call.

Digging a bit deeper I found this also happens outside of the if statement.

For example:

// ...
  if (test) begin
      foo.bar = 3;
      foo = bar.baz;
  end
// ...

Parses to:

seq_block [8, 24] - [11, 7]
  statement_or_null [9, 6] - [9, 18]                              // foo.bar = 3;
    statement [9, 6] - [9, 18]
      statement_item [9, 6] - [9, 18]
        blocking_assignment [9, 6] - [9, 17]
          operator_assignment [9, 6] - [9, 17]
            variable_lvalue [9, 6] - [9, 13]                      // variable_lvalue
              simple_identifier [9, 6] - [9, 9]
              select1 [9, 9] - [9, 13]
                member_identifier [9, 10] - [9, 13]               // Correctly detects the member_identifier
                  simple_identifier [9, 10] - [9, 13]
            assignment_operator [9, 14] - [9, 15]
            expression [9, 16] - [9, 17]
              primary [9, 16] - [9, 17]
                primary_literal [9, 16] - [9, 17]
                  integral_number [9, 16] - [9, 17]
                    decimal_number [9, 16] - [9, 17]
                      unsigned_number [9, 16] - [9, 17]
  statement_or_null [10, 6] - [10, 20]                            // foo = bar.baz;
    statement [10, 6] - [10, 20]
      statement_item [10, 6] - [10, 20]
        blocking_assignment [10, 6] - [10, 19]
          operator_assignment [10, 6] - [10, 19]
            variable_lvalue [10, 6] - [10, 9]
              simple_identifier [10, 6] - [10, 9]
            assignment_operator [10, 10] - [10, 11]
            expression [10, 12] - [10, 19]                        // Expression not correctly parsing the member_identifier
              primary [10, 12] - [10, 19]
                function_subroutine_call [10, 12] - [10, 19]
                  subroutine_call [10, 12] - [10, 19]
                    method_call [10, 12] - [10, 19]
                      primary [10, 12] - [10, 15]
                        simple_identifier [10, 12] - [10, 15]
                      method_call_body [10, 16] - [10, 19]
                        method_identifier [10, 16] - [10, 19]
                          method_identifier [10, 16] - [10, 19]
                            simple_identifier [10, 16] - [10, 19]

Any thoughts on how to continue?

drom commented 4 years ago

Thank you for the investigation.

In the following test case.

module mod ();
  always_comb foo = bar.baz;
endmodule

It is quite strange seing bar.baz as function_subroutine_call -> method_call_body here.

(source_file (module_declaration
  (module_header (module_keyword) (simple_identifier))
  (module_nonansi_header (list_of_ports))
  (module_or_generate_item (always_construct (always_keyword)
    (statement (statement_item (blocking_assignment (operator_assignment
      (variable_lvalue (simple_identifier))
      (assignment_operator)
      (expression (primary
        (function_subroutine_call (subroutine_call (method_call             // <- is it function call?
          (primary (simple_identifier))
          (method_call_body (method_identifier (method_identifier (simple_identifier))))
        )))
      ))
    ))))
  ))
))

Where is goes wrong is adding packed dimension to the function call:

module mod ();
  always_comb foo = bar.baz[7:0];
endmodule
(source_file (module_declaration
  (module_header (module_keyword) (simple_identifier))
  (module_nonansi_header (list_of_ports))

  (ERROR
    (always_keyword)
    (variable_lvalue (simple_identifier))
    (assignment_operator)
    (constant_function_call (function_subroutine_call (subroutine_call (method_call (primary (simple_identifier)) (method_call_body (method_identifier (method_identifier (simple_identifier))))))))
    (packed_dimension
      (constant_range
        (constant_expression (constant_primary (primary_literal (integral_number (decimal_number (unsigned_number))))))
        (constant_expression (constant_primary (primary_literal (integral_number (decimal_number (unsigned_number))))))
      )
    )
  )
  (module_or_generate_item (package_or_generate_item_declaration))
))

I have to look at the grammar to see how that should be resolved.