stfc / fparser

This project maintains and develops a Fortran parser called fparser2 written purely in Python which supports Fortran 2003 and some Fortran 2008. A legacy parser fparser1 is also available but is not supported. The parsers were originally part of the f2py project by Pearu Peterson.
https://fparser.readthedocs.io
Other
62 stars 29 forks source link

array_element vs array_sections #213

Closed sergisiso closed 4 years ago

sergisiso commented 4 years ago

I still don't fully understand what is the difference between them but in the fparser2 Designator class the subclass_names (line 3898) seems to be in the wrong order (see R602). Changing the order makes the 1D arrays behavior consistent with multiple dimension arrays.

rupertford commented 4 years ago

For background, the lhs of an assignment-stmt is a variable which in turn is a designator ...

R734 assignment-stmt is variable = expr R601 variable is designator

As @sergisiso says, in the Fortran2003 spec a designator can be an array-element or an array-section (or some other things that are not relevant to this issue):

R603 designator is object-name or array-element or array-section or structure-component or substring

where array-element and array-section are

R616 array-element is data-ref R617 array-section is data-ref [ ( substring-range ) ]

A substring-range can take the form

R611 substring-range is [ scalar-int-expr ] : [ scalar-int-expr ]

The other relevant rule path is ...

R612 data-ref is part-ref [ % part-ref ] ... R613 part-ref is part-name [ ( section-subscript-list ) ]

So, if we take the following lhs of an assignment:

a(:)

this matches all of the following rule hierarchies:

1: variable->designator->array-element->data-ref->part-ref->part-name(section-subscript-list) and 2: variable->designator->array-section->data-ref->part-ref->part-name(section-subscript-list) and 3: variable->designator->array-section->data-ref(substring-range)

whereas multiple dimensions e.g. a(:,i) will only match one of the first two as substring-range can only be lhs:rhs. Regarding the first two matches, either match would result in the same fparser2 object hierarchy, so it does not matter which of these two is matched.

substring-range only accepts lhs:rhs, whereas section-subscript supports start:stop:step. Therefore a(2:8:2) will also only match one of the first two irrespective of the rule ordering.

In the current fparser2 implementation the or rules are tested in order. As @sergisiso says, at the moment rule R603 is implemented so that array-section is checked before array-element. This means that a(:) will always match with the substring-range rule hierarchy.

The suggestion by @sergisiso is to re-order so that array-element is checked before array-section. That would mean that a(:) would always match the section-subscript-list rule hierarchy.

rupertford commented 4 years ago

The problem with simply changing the ordering is that, whilst non-character arrays would be parsed correctly, character arrays with a substring-range would not. We should probably make this change in any case because most of our codes are array based and make little use of character string manipulation, so most of the time we will get what we expect.

Taking a look at the bigger issue. It should be possible to run all RHS rules independently and only one should match, i.e. the order they are tested should not matter. I should probably write a debug mode that tests and validates this at some point.

However, how do we distinguish. Well, constraint C619 sorts this out. We match differently depending on the datatype of a.

C619 (R617) If a substring-range appears, the rightmost part-name shall be of type character.

The problem with this is that the matching is currently done in isolation (all the match will see is 'a(:)') so there is no way to know what datatype a might be.

rupertford commented 4 years ago

What I think should be done is that the order should be changed so that we get the behaviour we expect for the codes we are interested in. However, a failing test should be written for slicing characters strings. This failing test should then be added to #202 or similar issue to make sure the problem will be fixed when keeping symbol information

rupertford commented 4 years ago

To clarify, this problem could be addressed if the type of array, a in our example, could be found, so that the rule could query what datatype it is. In general it will not be possible to do this and we should probably require users to add appropriate datatype information in that case, as well as trying to automatically find the information (via searching through modules).

rupertford commented 4 years ago

Created branch 213_element_or_section

arporter commented 4 years ago

201 and #202 capture the bigger problem of identifying symbol types and #238 is merged so closing this one.