j3-fortran / fortran_proposals

Proposals for the Fortran Standard Committee
175 stars 14 forks source link

Allow ASSOCIATE to reference previous names in the same ASSOCIATE constuct, e.g. ASSOCIATE(b=>a, c=>b) #321

Open kbauer opened 7 months ago

kbauer commented 7 months ago

The ASSOCIATE construct is useful to introduce names for expressions in a scoped manner, but its utility is limited the frequent use for nesting. Utility would be significantly improved, if referencing previously associated names in the same ASSOCIATE block would be allowed, i.e.

ASSOCIATE(b => a, c => b)

and similar.

1. Related suggestions

1.1. LET semantics

There are suggestions for a more general construct, usually referring to the concept as let, e.g. “A shorthand for immutability #221” and “LET statement concept #279”.

If such suggestions would gain traction, they would cover this use-case.

However, they are much larger in scale. Extending the standard-compliant behavior of the ASSOCIATE construct seems more likely to be accepted into the standard:

  1. It doesn't introduce a new keyword.
  2. It makes previously invalid code valid, but doesn't change the behavior of previously valid code.
  3. It is already allowed by Intel Fortran.
  4. It should be much easier to implement by other compilers, resulting in likely faster availability for usage in real-world projects.:₋)

1.2. Variable initialization without SAVE attribute

A similar purpose with more explicit typing would be fulfilled by “Syntax to initialize variables in declarations without SAVE #234” and “Deprecate and remove implicit save behavior #40”.

This suggestion still has the advantage of being independent of backward-compatibility considerations such as arise with disabling the implicit SAVE attribute.

Compared to #234, this suggestion allows a more terse syntax, where types are inferred, using an already-existing mechanism.

2. Real world example

For example, the following block-construct is an anonymized version of code I wrote in my workplace project:

  ! Previously defined variables: iFreq0, frequencies, df, Omega, iStart, data
  BLOCK
    ! locals
    REAL :: frequency
    INTEGER :: iFreq, iFreqPlusOffset, iFreqMinusOffset
    INTEGER :: iOffset, iOffsetMinusFreq
    COMPLEX :: A(3,3)
    ! body
    frequency = frequencies(iFreq0)
    iFreq = freq2Index(frequency)
    iFreqPlusOffset = freq2Index(frequency + Omega)
    iFreqMinusOffset = freq2Index(frequency - Omega)
    iOffset = freq2Index(Omega)
    iOffsetMinusFreq = freq2Index(Omega - frequency)
    A(:,:) = diag(data(iFreq, iStart:iStart+2))

    ! ...
  END BLOCK

Ideally, this BLOCK could be replaced with

  ASSOCIATE( &
       frequency => frequencies(iFreq0), &
       iFreq => freq2Index(frequency), &
       iFreqPlusOffset => freq2Index(frequency + Omega), &
       iFreqMinusOffset => freq2Index(frequency - Omega), &
       iOffset => freq2Index(Omega), &
       iOffsetMinusFreq => freq2Index(Omega - frequency), &
       A => diag(data(iFreq, iStart:iStart+2)))

    ! ...

  END ASSOCIATE

This is allowed by Intel Fortran, with the expected behavior. It is rejected by, for instance, gfortran, and does indeed not seem to be standard conforming. See e.g. discussion in this stackoverflow.com post.

In order to be allowed across compilers, currently nesting is required, where each associate construct can only see names associated in an outer scope.

  ASSOCIATE(frequency => frequencies(iFreq0))
    ASSOCIATE( &
         iFreq => freq2Index(frequency), &
         iFreqPlusOffset => freq2Index(frequency + Omega), &
         iFreqMinusOffset => freq2Index(frequency - Omega), &
         iOffset => freq2Index(Omega), &
         iOffsetMinusFreq => freq2Index(Omega - frequency))
      ASSOCIATE(A => diag(data(iFreq, iStart:iStart+2)))
        ! ...
      END ASSOCIATE
    END ASSOCIATE
  END ASSOCIATE

  ! ...
END SUBROUTINE demonstration

The increase in nesting level hinders code readability and effectively discourages use of the ASSOCIATE construct.

In practice most code in the project does not use either BLOCK or ASSOCIATE, and instead defines all local variables at the subroutine level, which leads to problems with refactoring, as values assigned in one section of large subroutines by default leak to the next section, regardless of whether the value, or only the variable name, is reused.

3. Limitations

Between this suggestion and those linked in “1. Related Suggestions”, none address the very much real-world scenario of needing to explicitly declare output variables for subroutines.

Since there is currently no efficient way of returning large data structures (vectors, matrices, ...) from a function, and due to the use of long-existing APIs like LAPACK, it is common in Fortran to see results returned by output parameters.

For these, more convenient declaration of variables close to the code line, where they are first needed, would be discussed, e.g. implicit blocks, where

DO iDM = 1, n
    TYPE(DataManagerType) :: dataManager
    CALL getDataManager(iDM, dataManager)
    ! ...
    REAL :: matrix(3,3)
    CALL dataManager%getTransformationMatrix(matrix)
    ! ...
END DO

would be equivalent to

DO iDM = 1, n
    BLOCK
        TYPE(DataManagerType) :: dataManager
        CALL getDataManager(iDM, dataManager)
        ! ...
        BLOCK
            REAL :: matrix(3,3)
            CALL dataManager%getTransformationMatrix(matrix)
            ! ...
        END BLOCK
    END BLOCK
END DO
klausler commented 7 months ago

J3 definitely got this case wrong, but it's too late to change the scoping of ASSOCIATE and related constructs without changing the current interpretation of existing programs like this one:

program main
  integer :: foo = 1, bar = 2
  associate (foo => bar, baz => foo)
    print *, baz
  end associate
end

The construct entity baz is associated with the outer foo; changing the semantics would cause baz to associate with bar by way of the inner foo.

kbauer commented 7 months ago

I missed that case :/ So in the end, an extension of associate would be more problematic than expected after all.

However. It raises a bit the question of whether "reusing a name in ASSOCIATE is actually a real-world concern.

As I understand, the standard says that baz should be assigned the outer value of foo = 1.

Using ifort (IFORT) 2021.2.0 20210228, the behavior already is strange.

  1. When compiling with -warn all, the compiler says

    >> ifort -warn all a.f90 -o a.bin 
    a.f90(2): remark #7712: This variable has not been used.   [FOO]
      integer :: foo = 1, bar = 2
    -------------^

    but outputs 1, which contradicts the warning.

  2. When removing foo = 1, there is no warning, and the output is 2, corresponding to the suggested extension.

  3. When removing foo = 1, and adding implicit none, the compiler balks with

    >>> ifort -warn all a.f90 -o a.bin 
    a.f90(4): error #8843: An associate name defined in an ASSOCIATE statement must not be subsequently used as a selector in the same statement.   [FOO]
      associate (foo => bar, baz => foo)
    --------------------------------^

This inconsistency hints at real-world programs already having to avoid this scenario, if they want to work with an Intel compiler.

By contrast, gofrtran-4.8 and gfortran-11 both accept the program without warning, output the expected 1, and when deleting foo = 1, reject the program with

>>> gfortran-11 -Wall a.f90 -o a.bin && ./a.bin
a.f90:4:35:

    4 |   associate (foo => bar, baz => foo)
      |                                   1
Error: Symbol ‘foo’ at (1) has no IMPLICIT type
a.f90:4:35:

    4 |   associate (foo => bar, baz => foo)
      |                                   1
Error: Symbol ‘baz’ at (1) has no IMPLICIT type

Possible solution

The behavior of Intel (albeit apparently unintended) hints at a possible solution of backwards compatibility.

  1. If the name is defined in the outer scope,
    1. Retain the old behavior, i.e. the outer value is used.
    2. Warn about a possible conflict.
  2. If the name is not defined in the outer scope, allow the new behavior.

Precedent for backwards-incompatible changes

Note that changing the behavior of pre-existing code is already not new to Fortran.

program main
  implicit none
  integer, allocatable :: a(:)
  allocate(a(5))
  a = -1
  a = (/ 1, 2, 3 /)
  print *, a
end

Using gfortran with -std=, the output is

-std=f95:      1    2    3   -1   -1
-std=f2003:    1    2    3

Arguably, the old behavior is unsafe, as it allows assignments between different-size arrays, that are quite possibly unintended, with neither compile-time nor run-time warning or error.

The “realloc lhs” behavior already changes the behavior of existing statements. (Though honestly, I consider that to be an issue, as it can change the behavior of a program without proper warning, so I'd rather be in favor of not changing the behavior of previously standard-conforming programs.)

klausler commented 7 months ago

Fortran doesn't require any declaration or use of a name in an outer scope to make that name exist in that scope. This already comes up in the context of module & inner procedures.

But even if that were not a concern, the change you suggest would present a danger to code that might declare a name in the other scope later.

The F'2003 semantics for assignments to allocatables did not invalidate any working conforming code. (F'2023 has a regrettable change that does change behavior in working conforming code, but that is a mistake.)

FortranFan commented 7 months ago

@kbauer commented Nov. 21, 2023 03:43 AM EST:

I missed that case :/ So in the end, an extension of associate would be more problematic than expected after all.

@kbauer ,

I generally support the sentiment behind your request for an enhancement to the ASSOCIATE construct.

As noted, what you suggest in the original post here, if left as is, will most likely be DoA with the standard committee.

So why don't you then consider adapting to the situation, say you propose some clause(s) to go with ASSOCIATE like so - the note the syntax therein is illustrative for the purpose here:

  integer :: foo = 1, bar = 2
  associate ( chain=yes ) ( foo => bar, baz => foo )
    print *, baz
  end associate
end

So then with the clause as in ( chain = yes ), the selector in foo is not the object in the hosting scope, rather it's the associate-name in the existing construct. With such semantics, the program response shall be 2. Without the clause, the existing semantics shall apply and thus the program response shall be 1.

The employment of such a clause is one way I can think of to get around the problematic situation due to existing semantics. Those with Fortran compiler development background and experience might be able to come up with better alternatives. The key is the mindset, how to develop solutions that enhance the practice of Fortran: an improved ASSOCIATE facility overall relative to what's in the current standard (2023) very much belongs in this category.

ashe2 commented 7 months ago

I suggest this for the new semantics:

ASSOCIATE (b => a) (c => b)

That is, the selectors in a parenthesized associate-list can reference names defined in earlier lists.

Also, it would be helpful for compilers to warn about the original ASSOCIATE(b => a, c => b) If the user really did not mean the second b to refer to the first one, they could make it clear by writing: ASSOCIATE(c => b, b => a)

KHUSHIJAIN910 commented 7 months ago

would like to be part of these journey

PierUgit commented 7 months ago

Note that changing the behavior of pre-existing code is already not new to Fortran.

program main
  implicit none
  integer, allocatable :: a(:)
  allocate(a(5))
  a = -1
  a = (/ 1, 2, 3 /)
  print *, a
end

This code was invalid and with an undefined behavior before Fortran 2003 (incompatible shapes of lhs and rhs)

PierUgit commented 7 months ago

I suggest this for the new semantics:

ASSOCIATE (b => a) (c => b)

I like this one above. It's not verbose and it has a clear interpretation, i.e. equivalent to

ASSOCIATE (b => a)
   ASSOCIATE (c => b)
kbauer commented 7 months ago

I suggest this for the new semantics:

ASSOCIATE (b => a) (c => b)

I like this one above. It's not verbose and it has a clear interpretation, i.e. equivalent to

ASSOCIATE (b => a)
   ASSOCIATE (c => b)

It also works well with continuation lines.

ASSOCIATE(element => elements(ielement)) &
         (lastindex => element%firstindex + element%size - 1) &
         (elementvector => statevector(element%firstindex:lastindex))
klausler commented 7 months ago

Note that changing the behavior of pre-existing code is already not new to Fortran.

program main
  implicit none
  integer, allocatable :: a(:)
  allocate(a(5))
  a = -1
  a = (/ 1, 2, 3 /)
  print *, a
end

This code was invalid and with an undefined behavior before Fortran 2003 (incompatible shapes of lhs and rhs)

There is a world of difference between a change that only affects buggy non-conforming code and.a change that alters the behavior of correct conforming code. The F'03 change to assignment to allocatables affecting existing applications mildly -- some code slowed down until A=B got rewritten to A(:)=B to prevent the runtime shape conformance check -- but didn't affect results. Changes that silently affect results of working code should never happen in the Fortran standard. (F'23 has one like this and it's a mistake that I'm unlikely to implement.)

PierUgit commented 7 months ago

Changes that silently affect results of working code should never happen in the Fortran standard.

I fully agree, and this was actually my point in the message you have quoted.

F'23 has one like this and it's a mistake that I'm unlikely to implement.

Which one ?

KHUSHIJAIN910 commented 7 months ago

Changes that silently affect results of working code should never happen in the Fortran standard.

I fully agree, and this was actually my point in the message you have quoted.

F'23 has one like this and it's a mistake that I'm unlikely to implement.

Which one ?

hey guys will be happy to be part of this exciting conversation :)

kbauer commented 7 months ago

Note that changing the behavior of pre-existing code is already not new to Fortran.

program main
  implicit none
  integer, allocatable :: a(:)
  allocate(a(5))
  a = -1
  a = (/ 1, 2, 3 /)
  print *, a
end

This code was invalid and with an undefined behavior before Fortran 2003 (incompatible shapes of lhs and rhs)

There is a world of difference between a change that only affects buggy non-conforming code and.a change that alters the behavior of correct conforming code. The F'03 change to assignment to allocatables affecting existing applications mildly -- some code slowed down until A=B got rewritten to A(:)=B to prevent the runtime shape conformance check -- but didn't affect results. Changes that silently affect results of working code should never happen in the Fortran standard. (F'23 has one like this and it's a mistake that I'm unlikely to implement.)

Honestly, I was simply not aware, that the old behavior was not standard-conforming. It seems to have been implemented by at least Intel Fortran and GFortran in this manner though.