Closed sergisiso closed 4 years ago
For background, the lhs of an assignment-stmt
is a variable
which in turn is a designator
...
R734 assignment-stmt is variable = expr R601 variable is designator
As @sergisiso says, in the Fortran2003 spec a designator
can be an array-element
or an array-section
(or some other things that are not relevant to this issue):
R603 designator is object-name or array-element or array-section or structure-component or substring
where array-element
and array-section
are
R616 array-element is data-ref R617 array-section is data-ref [ ( substring-range ) ]
A substring-range can take the form
R611 substring-range is [ scalar-int-expr ] : [ scalar-int-expr ]
The other relevant rule path is ...
R612 data-ref is part-ref [ % part-ref ] ... R613 part-ref is part-name [ ( section-subscript-list ) ]
So, if we take the following lhs of an assignment:
a(:)
this matches all of the following rule hierarchies:
1: variable->designator->array-element->data-ref->part-ref->part-name(section-subscript-list)
and
2: variable->designator->array-section->data-ref->part-ref->part-name(section-subscript-list)
and
3: variable->designator->array-section->data-ref(substring-range)
whereas multiple dimensions e.g. a(:,i)
will only match one of the first two as substring-range
can only be lhs:rhs
. Regarding the first two matches, either match would result in the same fparser2 object hierarchy, so it does not matter which of these two is matched.
substring-range
only accepts lhs:rhs
, whereas section-subscript
supports start:stop:step
. Therefore a(2:8:2)
will also only match one of the first two irrespective of the rule ordering.
In the current fparser2 implementation the or
rules are tested in order. As @sergisiso says, at the moment rule R603 is implemented so that array-section
is checked before array-element
. This means that a(:)
will always match with the substring-range
rule hierarchy.
The suggestion by @sergisiso is to re-order so that array-element
is checked before array-section
. That would mean that a(:)
would always match the section-subscript-list
rule hierarchy.
The problem with simply changing the ordering is that, whilst non-character arrays would be parsed correctly, character arrays with a substring-range would not. We should probably make this change in any case because most of our codes are array based and make little use of character string manipulation, so most of the time we will get what we expect.
Taking a look at the bigger issue. It should be possible to run all RHS rules independently and only one should match, i.e. the order they are tested should not matter. I should probably write a debug mode that tests and validates this at some point.
However, how do we distinguish. Well, constraint C619 sorts this out. We match differently depending on the datatype of a
.
C619 (R617) If a substring-range appears, the rightmost part-name shall be of type character.
The problem with this is that the matching is currently done in isolation (all the match will see is 'a(:)') so there is no way to know what datatype a
might be.
What I think should be done is that the order should be changed so that we get the behaviour we expect for the codes we are interested in. However, a failing test should be written for slicing characters strings. This failing test should then be added to #202 or similar issue to make sure the problem will be fixed when keeping symbol information
To clarify, this problem could be addressed if the type of array, a
in our example, could be found, so that the rule could query what datatype it is. In general it will not be possible to do this and we should probably require users to add appropriate datatype information in that case, as well as trying to automatically find the information (via searching through modules).
Created branch 213_element_or_section
I still don't fully understand what is the difference between them but in the fparser2 Designator class the subclass_names (line 3898) seems to be in the wrong order (see R602). Changing the order makes the 1D arrays behavior consistent with multiple dimension arrays.