j3-fortran / fortran_proposals

Proposals for the Fortran Standard Committee
175 stars 14 forks source link

[start:end:inc] array constructor #329

Open PierUgit opened 4 months ago

PierUgit commented 4 months ago

It's quite common to need an array forming an arithmetic sequence, and the current way is with the implied do feature: a = [(i,i=start,end,inc)]

Instead, we may have the matlab-like notation: a = [start:end:inc]

It's more readable, and there's no need of an index variable.

My initial idea was just a = start:end:inc without the square brackets, but I think they are needed to avoid any ambiguity with the array slices: array(1:10:2) and array([1:10:2]) would return the same result, but would be fundamentally different, the former being a classical array slice, and the latter being an array indexing. But maybe there would be no ambiguity for the compilers, and 1:10:2 should never be interpreted as an array constructor in array(1:10:2)

Nonetheless requiring the square brackets has another advantage: there's no need to define precedences between : and the arithmetic operators: 2 * start:end:inc could be 2*(start:end:inc) or (2*start):end:inc 2 * [start:end:inc] is unambiguous

w6ws commented 4 months ago

Another way would be to add an intrinsic function to do index generation.

I've written several variants of this in the past, which I typically call "iota" - after the APL iota operator. This version uses the HPF sum_prefix function for entertainment:

  pure function iota (n, base, stride) result (res)
    integer, intent(in) :: n
    integer, intent(in), optional :: base
    integer, intent(in), optional :: stride
    integer :: res(n)

    integer :: i

    if (present (stride)) then
      res = stride
    else
      res = 1
    end if

    if (present (base)) then
      res(1) = base
    else
      res(1) = 1
    end if

    res = sum_prefix (res)

  end function iota
PierUgit commented 4 months ago

In contrast to many other languages, Fortran has already an intuitive syntax to express a range of indeces for array slice notation. It seems natural to me to generalize this syntax to other contexts (as in Matlab).

There is IMO less added value with defining a new intrinsic function, because everybody can (and often does) write a function for that, but one can not emulate the start:end:inc syntax.

PierUgit commented 1 month ago

I have just discovered that the Intel compiler is already accepting this syntax: https://fortran-lang.discourse.group/t/implied-do-array-constructor-type-specs-and-differences-between-gfortran-intel-and-lfortran/8047/23?u=pieru (and has been since at least ifort 19).

Worth submitting a proposal IMO.

PierUgit commented 1 month ago

Here is a proposal that I plan submitting to the J3 commitee: https://github.com/PierUgit/fortran_proposals/blob/array-constructor/proposals/array_constructor.txt

Comments are welcome.

klausler commented 1 month ago

What's the "unexpected behavior" from flang-new?

PierUgit commented 1 month ago

[1:10:3] returns a kind=8 integer array, although it is defined with default integers (kind=4) only. Not sure if it's intentional or not, but it doesn't look a good idea to me...

PierUgit commented 1 month ago

That said, the behavior of ifx is somehow unexpected too. The kind of the result array is only driven by the kind of the lower bound:

print*, [1:10:3]

print*, kind( [1  :10  :3  ] )
print*, kind( [1_8:10  :3  ] )
print*, kind( [1  :10_8:3  ] )
print*, kind( [1  :10  :3_8] )
print*, kind( [1  :10_8:3_8] )
print*, kind( [1_8:10_8:3_8] )

end

ifx 2024 output:

           1           4           7          10
           4
           8
           4
           4
           4
           8

flang-new output:

 1 4 7 10
 8
 8
 8
 8
 8
 8

In the proposal I have written that the integer expressions start, stop, and inc shall be of the same kind: maybe it's unnecessary restrictive. They could be allowed to be of different kinds, with an implicit casting to the kind of the expression having the highest range, and the resulting values would be of this kind.

klausler commented 1 month ago

Thanks for the explanation. I use 64-bit integers for all loop counts and subscripts, but neglected to convert them in this instance to the result type of the triplet, which should be the type that lower+upper+stride would have. Will fix.

klausler commented 1 month ago

https://github.com/llvm/llvm-project/pull/92970 fixes this bug. Thanks again for noticing it.