Exploring better ways to handle string concatenation in string-based rules

Issue:

Many string-based rules in bsdl.ebnf (ones that are enclosed in quotes) cannot handle string concatenation between tokens.

This is a consequence of many BSDL attributes being strings...

Example:

We can cause parsing issues by inserting string concatenation in areas where string rules in bsdl.ebnf do not expect them.

INSTRUCTION_OPCODE

Looking at the grako rules for INSTRUCTION_OPCODE, we see the following:

instruction_opcode_stmt = "attribute" "INSTRUCTION_OPCODE" "of" component_name
    colon "entity" "is" @:opcode_table_string semicolon ;
opcode_table_string = "&".{quote (comma).{[@+:opcode_description]} quote} ;
opcode_description = instruction_name:instruction_name left_paren opcode_list:opcode_list right_paren ;
opcode_list = (comma).{[@+:opcode]} ;
opcode = pattern ;

Looking at opcode_description, we see that it does not expect string concatenation to occur between instruction_name and the left parenthesis. So the following would not work in the above rules:

attribute INSTRUCTION_OPCODE of A : entity is
    "IDCODE" &
    "(101001)"

Additionally, opcode_list does not expect string concatenation between opcode patterns. So the following would not work either:

attribute INSTRUCTION_OPCODE of A : entity is
    "IDCODE (101001," &
    "010110)"

Proposed Solution:

For any string-based rules with more than one token, expect string concatenation to happen between tokens.

Treat any expression element between quote's as a string-based rule. In the INSTRUCTION_OPCODE example, opcode_description and opcode_list should be considered a string-based rule since they would be enclosed in quotes from opcode_table_string.

Example solution for INSTRUCTION_OPCODE:

First, let's recognize string concatenation as the pattern" & " and call it end_and_start:

end_and_start = quote '&' quote ;

With this new rule, we can catch repeating empty string concatenation between tokens ("token" & "" & "" & "" & "token") with token {end_and_start} token. (As you will see below, there are many uses for this rule.)

Then, for each rule, handle how and where string concatenation may occur as shown below:

instruction_opcode_stmt = "attribute" "INSTRUCTION_OPCODE" "of" component_name
    colon "entity" "is" @:opcode_table_string semicolon ;
opcode_table_string = "&".{
    quote
    {end_and_start}
    ({end_and_start} comma {end_and_start}).{
        [@+:opcode_description]
    }
    {end_and_start}
    quote} ;
opcode_description = instruction_name:instruction_name
    {end_and_start}
    left_paren
    {end_and_start}
    opcode_list:opcode_list
    {end_and_start}
    right_paren ;
opcode_list = ({end_and_start} comma {end_and_start}).{[@+:opcode]} ;
opcode = pattern ;

As a bonus, by handling the possible presence of string concatenation, we no longer need gather-optional expressions and can drop the square brackets. This helps improve TatSu compatibility (#2).

...
opcode_table_string = "&".{
    quote
    {end_and_start}
    ({end_and_start} comma {end_and_start}).{
        @+:opcode_description
    }
    {end_and_start}
    quote} ;
...
opcode_list = ({end_and_start} comma {end_and_start}).{@+:opcode} ;
...

Relevant Branches:

testing branch: https://github.com/hansemro/python-bsdl-parser/tree/string_concatenation_dev
prototype branch: https://github.com/hansemro/python-bsdl-parser/tree/tatsu_migration_dev

Tasks:

[ ] Update string-based rules:
- [x] string
- [x] PIN_MAP
  - [x] map_string
  - [x] port_map
  - [x] port
  - [x] pin_list
  - [x] remove handler in BsdlSemantics
- [x] PORT_GROUPING
  - [x] group_table_string
  - [x] group_table
  - [x] twin_group_list
  - [x] twin_group
  - [x] remove handler in BsdlSemantics
- [x] COMPLIANCE_PATTERNS
  - [x] compliance_pattern_string
  - [x] twin_group_entry
- [x] INSTRUCTION_OPCODE
  - [x] opcode_table_string
  - [x] opcode_list
- [x] INSTRUCTION_CAPTURE
  - [x] pattern_list_string
- [x] INSTRUCTION_PRIVATE
  - [x] instruction_list_string
  - [x] instruction_list
- [x] IDCODE_REGISTER
  - [x] idcode_statement
- [x] USERCODE_REGISTER
  - [x] usercode_statement
- [x] thirty_two_bit_pattern_list
- [x] REGISTER_ACCESS
- [x] BOUNDARY_REGISTER
  - [x] cell_table_string
  - [x] cell_table
  - [x] cell_entry
  - [x] cell_info
  - [x] cell_spec
  - [x] disable_spec
- [x] BOUNDARY_SEGMENT
  - [x] boundary_segment_string
  - [x] boundary_segment_list
- [x] RUNBIST_EXECUTION
  - [x] runbist_description
  - [x] runbist_spec
  - [x] wait_spec
  - [x] time_and_clocks
  - [x] clock_cycles_list
  - [x] signature_spec
- [x] INTEST_EXECUTION
  - [x] intest_description
  - [x] intest_spec
- [x] SYSCLOCK_REQUIREMENTS
  - [x] system_clock_description_string
  - [x] system_clock_requirement
  - [x] clocked_instructions
- [ ] REGISTER_MNEMONICS
  - [ ] register_mnemonics_string
  - [ ] mnemonic_definition
  - [ ] mnemonic_list
  - [ ] mnemonic_assignment
- [ ] REGISTER_FIELDS
  - [ ] register_fields_string
  - [ ] register_field_list
  - [ ] register_fields
  - [ ] register_field
  - [ ] extended_field_name
  - [ ] field_length
  - [ ] bit_list_and_options
  - [ ] bit_list
  - [ ] prefix_statement
- [ ] value_assignment
- [ ] mnemonic_association
- [ ] local_reset_assignment
- [ ] domain_assignment
- [ ] REGISTER_ASSEMBLY
  - [ ] register_assembly_string
  - [ ] register_assembly_list
  - [ ] register_assembly_elements
  - [ ] register_element
  - [ ] instance_and_options
  - [ ] instance_definition
  - [ ] array_ident
  - [ ] field_value_assignment
  - [ ] field_reset_assignment
  - [ ] field_domain_assignment
  - [ ] field_ident
  - [ ] array_instances
  - [ ] field_and_options
  - [ ] array_instance
  - [ ] selected_segment_element
  - [ ] field_selection_assignment
  - [ ] selection_field
  - [ ] field_reference
  - [ ] selection_values
  - [ ] segment_selection
  - [ ] broadcast_field
  - [ ] broadcast_values
  - [ ] broadcast_selection
  - [ ] boundary_instance
  - [ ] using_statement
  - [ ] package_hierarchy
- [ ] REGISTER_CONSTRAINTS
  - [ ] constraints_string
  - [ ] constraints_list
  - [ ] constraint_checks
  - [ ] nested_expr
  - [ ] binary_expr
  - [ ] mnemonic_pattern
- [ ] REGISTER_ASSOCIATION
  - [ ] register_association_string
  - [ ] register_association_list
  - [ ] reg_field_or_instance
  - [ ] port_list
  - [ ] port_association_list
  - [ ] info_list
  - [ ] clock_list
  - [ ] user_list
  - [ ] single_word_user_list
  - [ ] multi_word_user_list
  - [ ] unit
  - [ ] unit_definition
- [ ] POWER_PORT_ASSOCIATION
  - [ ] power_port_association_string
  - [ ] power_port_association_list

hansemro / python-bsdl-parser