improve specification of non-constant array shape declarations and pdt initialization

sigfig commented 3 years ago

fortran as specified has some semantic gaps in the interpretation of the restricted scalar expressions used in procedure interfaces, the interpretation of automatic array declarations using the dimension attribute, and the interpretation of parameterized derived type initialization in type declarations. this seems to be a substantial conformance issue, gfortran ifort and cuda fortran all have noticeable differences in how these language features are interpreted. i've found a few cases in gfortran and ifort where fortran 2003 compliant code will cause crashes during compilation. this is especially noticeable when using type parameters, pdts with integer, len parameters are largely unusable in programs where they would otherwise provide a lot of benefit, due to incomplete implementations and lack of testing. recent specification changes are no help here, after reviewing the semantics in fortran 2018, i have no idea how to implement these features myself without ignoring most of the rules.

on the fortran language spec side i believe there's a straightforward remedy for making compliance feasible:

split the roles of specification expressions, further constraining the expressions that can be used in "shape expressions"
merge all the existing rules for non-constant array shapes, coarray shapes, and integer, len type parameters into one section on resolving dynamic and assumed shapes
introduce an operational interpretation of dynamic shape expressions into the spec

part 1 only entails removing the already unusable primaries, intrinisics, and inquiries from usage in shape expressions. part 2 is a spec simplification, no semantics are changed, they're just deduplicated and the intended interpretation is made more explicit, without adding any new concepts.

part 3 is the substantial change, to be useful it requires a non-trivial amount of reinterpretation and the addition of many new rules.

my proposal to make this easy in both specification and implementation is to introduce just one new semantic construction that corresponds to an integer constraint interpretation of the scalar expressions used in procedure interfaces. this is opposed to the current imperative expression evaluation interpretation of the same syntax.

this addition would require adding a notion of a "shape invariant", a few different classes of shape invariants, and resolution rules distinct from expression evaluation.

that seems to be a much broader scope of changes than other proposals, but it does seem like a constraint interpretation fits a lot closer to the existing semantics of shape expressions in fortran. as a trivial example, a matmul interface:

function f(x, y, z, l, r) result(o)
  integer :: x, y, z
  real, dimension(x, y) :: l
  real, dimension(y, z) :: r
  real, dimension(x, z) :: o
end function f

the required shape validation checks could be evaluated from top to bottom, from bottom to top, or in parallel, with no meaningful differences. we don't really have expression reduction available here either, either the interface is satisfiable or it isn't, there are no intermediate states in which a subset of this interface could be meaningfully satisfiable. if we introduced data dependencies or inequalities, we would arrive at an ordering, but it won't necessarily be an ordering over a set of statements, or over expressions. any ordering induced by an interface is an arbitrary lattice over variable relations. these look a lot more like simultaneous constraints than sequential statements.

this reinterpretation has a substantial upside with respect to specification effort: pdt initialization constraints can always be transcluded into the constraint set of the interface that declares the type, no extra rules are needed. additionally, the admissable intrinsic functions and requirements on specification functions admit easy constraint propagation. on the implementation side, the consequences of this sort of change unquestionably adds complexity to a compiler. the material benefit for compiler projects is that pdts and statement expressions are no longer a murky area, and users can successfully write a parameterized matrix type without filing bug reports.

this presentation is eliding over many caveats you may be able to notice. i've drafted a proposal document that elaborates on the shape invariant representation, static semantics, and implementation strategies, but it's super long so i didn't want to bomb the proposals repo before getting initial feedback. fortran makes this approach mostly easy but it does throw in some quirks, let me know if any of you have better ideas.

sigfig commented 3 years ago

after rereading this post seems a bit more ambiguous compared to my other notes, so to be explicit, this proposal would introduce constraints as a semantic concept only in the language specification, not a programming construct accessible to fortran programmers. constraints are only intended to highlight the complexity class of resolving dynamic array shapes in fortran, and to make it more obvious how to write a conforming compiler. adopting this proposal would only entail reorganizing and rewording the rules related to shape expressions in the spec.

certik commented 3 years ago

Thanks @sigfig for the proposal. Do you think a Fortran standard conformance testsuite (#57) would help here? It seems it would help at least identify the bugs in the compiler regarding some feature, such as array bounds checking in a subroutine interface, or the PDT. As well as to "specify" in detailed examples what it means to have a certain feature.

The other part of your proposal is to reorganize the Fortran standard document to make it easier to write a Fortran compiler.

sigfig commented 3 years ago

if i was implementing a new compiler i'd absolutely want a canonical test suite for this set of features. in fortran evaluating these shape expressions constitutes a pretty big chunk of the dynamic semantics regarding memory management, and implementing them requires adding a lot of state tracking inside of a compiler, so good test coverage is really important. i haven't tried to enumerate the cases i'd want supported as unit tests, but this seems a bit difficult. the generality of the expression class used, scoping, and interaction with other rules for derived types produces a pretty sprawling tree of cases to test and provide canonical interpretations for. gfortran has the most publicly available test cases for these features, there's about 50 of them, but their coverage is not even close to guaranteeing any of it works as specified.

to make conformance really easy i'd go for property-based testing measured against a canonical interpreter. i've had success with this before for typed dsl projects, but i'm not sure if it's appropriate for fortran. to write the canonical interpreter for testing, the rules for shape expressions definitely need to be made more explicit than they are in fortran 2018. the constraint representation i'm proposing here achieves that (it's adapted from a previous dsl project of mine), but there are lots of other options if this proposal isn't ideal.

certik commented 3 years ago

Regarding the test suite, I think everybody agrees we need a lot more thorough compiler independent test suite for all Fortran features. That should be further discussed at #57.

We can keep discussing the other parts of your proposals here.

j3-fortran / fortran_proposals

improve specification of non-constant array shape declarations and pdt initialization #200