Open hansemro opened 1 year ago
string
rule:Currently, the string
literal rule allows optional quotation marks, which is invalid since quotation marks must be present for each string component. We can fix this by moving the quote
s outside the square brackets as shown below:
string = "&".{quote [@:?/[A-Za-z0-9\&'\(\)\[\]\*\,\-\+\.\:\;\<\=\>\_\/\t ]+/?] quote} ;
This no longer allows invalid string constructions like the following:
& "test"
test"
test
test &
test" &
Issue:
Many string-based rules in
bsdl.ebnf
(ones that are enclosed in quotes) cannot handle string concatenation between tokens.This is a consequence of many BSDL attributes being strings...
Example:
We can cause parsing issues by inserting string concatenation in areas where string rules in
bsdl.ebnf
do not expect them.INSTRUCTION_OPCODE
Looking at the grako rules for INSTRUCTION_OPCODE, we see the following:
Looking at
opcode_description
, we see that it does not expect string concatenation to occur betweeninstruction_name
and the left parenthesis. So the following would not work in the above rules:Additionally,
opcode_list
does not expect string concatenation between opcode patterns. So the following would not work either:Proposed Solution:
For any string-based rules with more than one token, expect string concatenation to happen between tokens.
Treat any expression element between
quote
's as a string-based rule. In the INSTRUCTION_OPCODE example,opcode_description
andopcode_list
should be considered a string-based rule since they would be enclosed in quotes fromopcode_table_string
.Example solution for INSTRUCTION_OPCODE:
First, let's recognize string concatenation as the pattern
" & "
and call itend_and_start
:With this new rule, we can catch repeating empty string concatenation between tokens (
"token" & "" & "" & "" & "token"
) withtoken {end_and_start} token
. (As you will see below, there are many uses for this rule.)Then, for each rule, handle how and where string concatenation may occur as shown below:
As a bonus, by handling the possible presence of string concatenation, we no longer need gather-optional expressions and can drop the square brackets. This helps improve TatSu compatibility (#2).
Relevant Branches:
Tasks:
string
map_string
port_map
port
pin_list
group_table_string
group_table
twin_group_entry
opcode_table_string
opcode_list
pattern_list_string
instruction_list_string
instruction_list
idcode_statement
usercode_statement
thirty_two_bit_pattern_list
cell_table_string
cell_table
cell_entry
cell_info
cell_spec
disable_spec
boundary_segment_string
boundary_segment_list
runbist_description
runbist_spec
wait_spec
time_and_clocks
clock_cycles_list
signature_spec
intest_description
intest_spec
system_clock_description_string
system_clock_requirement
clocked_instructions
register_mnemonics_string
mnemonic_definition
mnemonic_list
mnemonic_assignment
register_fields_string
register_field_list
register_fields
register_field
extended_field_name
field_length
bit_list_and_options
bit_list
prefix_statement
value_assignment
mnemonic_association
local_reset_assignment
domain_assignment
register_assembly_string
register_assembly_list
register_assembly_elements
register_element
instance_and_options
instance_definition
array_ident
field_value_assignment
field_reset_assignment
field_domain_assignment
field_ident
array_instances
field_and_options
array_instance
selected_segment_element
field_selection_assignment
selection_field
field_reference
selection_values
segment_selection
broadcast_field
broadcast_values
broadcast_selection
boundary_instance
using_statement
package_hierarchy
constraints_string
constraints_list
constraint_checks
nested_expr
binary_expr
mnemonic_pattern
register_association_string
register_association_list
reg_field_or_instance
port_list
port_association_list
info_list
clock_list
user_list
single_word_user_list
multi_word_user_list
unit
unit_definition
power_port_association_string
power_port_association_list