j3-fortran / fortran_proposals

Proposals for the Fortran Standard Committee
178 stars 15 forks source link

Unpacking variables from arrays #186

Open 14NGiestas opened 3 years ago

14NGiestas commented 3 years ago

I usually unpack variables from arrays when it's meaningful, eg:


subroutine cool_stuff(self, coords)
    integer(INT32) :: coords
    ...
    x = coords(1)
    y = coords(2)
    z = coords(3)
    self % coolest = [x**2, y**2, z**2]
end subroutine

I would be nice if i could unpack variables from vectors, maybe in the args:

subroutine cool_stuff(self, [x, y, z])
    integer(INT32) :: x, y, z
    self % coolest = [x**2, y**2, z**2]
end subroutine

Or even something like this:

subroutine cool_stuff(self, coords)
    integer(INT32) :: coords(3)
    integer(INT32) :: x, y, z
    [x, y, z] = coords
end subroutine
everythingfunctional commented 3 years ago

This is a subset of a feature in other languages called pattern matching, and is an incredibly convenient and useful feature. I too would like if it could be added.

certik commented 3 years ago

This is a feature I have been missing also, and I use it frequently in Python. Perhaps something like this, following the Python approach with tuples:

subroutine cool_stuff(self, coords)
    integer(INT32) :: coords(3)
    integer(INT32) :: x, y, z
    (x, y, z) = coords
end subroutine
everythingfunctional commented 3 years ago

To move the discussion forward a bit, I want to discuss a little bit about how this would probably work in Fortran. First, Fortran is a statically typed, compiled language, so for the simple use case - like deconstructing an array (or maybe deconstructing an object of known type), with a simple assignment statement, you would probably want to ensure at compile time that the pattern match cannot fail. I.e. for the statement [x, y, z] = coords, coords must be declared with size 3, and the same type and kind parameters as x, y, and z.

Next, one would want to have a branching construct for possible matches. The most natural syntax would be (I think) to reuse the select construct like:

select case (coords)
case (Point_2D_t(x, y))
  ! can now do stuff with x and y
case (Point_3D_t(x, y, z))
  ! can now do stuff with x, y and z
case default
  ! coords is still in scope, but we don't necessarily know it's type or components
end select

I don't think Fortran's type system is suited to check at compile time whether all possible matches have been accounted for, so I think a default branch would be required. Or, it could be that if one isn't specified, the processor must automatically insert one of the form:

case default
  error stop "Pattern match not satisfied for <select-expr>"

Since Fortran is an imperative language, there's also the possibility of just letting it fall through, and none of the branches are executed. This is the way existing select case and select type blocks work, so it would be consistent. However, if it is required, one could relax the restriction that the assignment statement form be guaranteed to succeed, and instead treat it more like syntactic sugar for the expanded form, where:

[x, y, z] = coords

is equivalent to

select case (coords)
case ([x, y, z])
   ! all following statements
case default ! inserted at end of scope
  error stop "Pattern match not satisfied for 'coords'"
end select
14NGiestas commented 3 years ago

At compile time the size is known, so this case is "easily" covered. At runtime we could just ensure that rhs has as many elements as lhs variables to unpack. Then one could do the following:

select case (size(coords))
case(1)
[x] = coords
case(2)
[x, y] = coords
case(3)
[x, y, z] = coords
default ! Extra: a pop-like behaviour
[x, y, z, coords] = coords ! now the first three were unpacked and the remaining kept in the array (realloc is prob. needed here)
end select
certik commented 3 years ago

Why cannot the following:

subroutine cool_stuff(coords)
integer, intent(in) :: coords(:)
integer :: x, y, z
(x, y, z) = coords
...
end subroutine

just be a syntactic sugar for:

subroutine cool_stuff(coords)
integer, intent(in) :: coords(:)
integer :: x, y, z
integer :: tmp(3)
tmp = coords
x = tmp(1)
y = tmp(2)
z = tmp(3)
...
end subroutine

With exactly the same semantics?

So if the size of coords is only known at runtime as in the above example, then tmp = coords would behave according to the current Fortran standard. And if coords is known at compile time, then tmp = coords would again behave according to the current Fortran standard.

14NGiestas commented 3 years ago

Indeed. I rather use the square brackets than parenthesis since there's no concept of tuples in Fortran (AFAIK). I believe the tmp variable is not needed with the current Fortran standard, since: [x, y, z] = coords can be changed into x = coords(1); y = coords(2); z=coords(3) We could also drop the brackets entirely too: x, y, z = coords to x = coords(1); y = coords(2); z=coords(3) and let the compiler do the rest.

certik commented 3 years ago

@14NGiestas good point. With square brackets and your "optimization", this:

subroutine cool_stuff(coords)
integer, intent(in) :: coords(:)
integer :: x, y, z
[x, y, z] = coords
...
end subroutine

should be equivalent to:

subroutine cool_stuff(coords)
integer, intent(in) :: coords(:)
integer :: x, y, z
x = coords(1); y = coords(2); z = coords(3)
...
end subroutine
everythingfunctional commented 3 years ago

I was thinking something like the following should also be possible:

subroutine do_some_things(stuff)
class(base_t), intent(in) :: stuff
integer :: thing1, thing2
real :: foo, bar, baz

select pattern(stuff)
pattern (child_1(x = thing1, y = thing2))
  ! do stuff with thing1 and thing2
pattern (child_2(a = foo, b = bar, c = baz))
  ! do stuff with foo, bar and baz
pattern default
  ! if nothing matched
end select
end subroutine

which I guess would be equivalent to

subroutine do_some_things(stuff)
class(base_t), intent(in) :: stuff
integer :: thing1, thing2
real :: foo, bar, baz

select type(stuff)
type is (child_1)
  child_1(x = thing1, y = thing2) = stuff
  ! do stuff with thing1 and thing2
type is (child_2)
  child_2(a = foo, b = bar, c = baz) = stuff
  ! do stuff with foo, bar and baz
class default default
  ! if nothing matched
end select
end subroutine

which would be equivalent to

subroutine do_some_things(stuff)
class(base_t), intent(in) :: stuff
integer :: thing1, thing2
real :: foo, bar, baz

select type(stuff)
type is (child_1)
  thing1 = stuff%x; thing2 = stuff%y
  ! do stuff with thing1 and thing2
type is (child_2)
  foo = stuff%a; bar = stuff%b; baz = stuff%c
  ! do stuff with foo, bar and baz
class default default
  ! if nothing matched
end select
end subroutine
everythingfunctional commented 3 years ago

And then, here's where things can get interesting, if you want to match specific values:

subroutine do_some_things(stuff)
class(base_t), intent(in) :: stuff
integer :: thing1, thing2
real :: foo, bar, baz

select pattern(stuff)
pattern (child_1(x = thing1, y = 1))
  ! do stuff with thing1 knowing y was 1
pattern (child_1(x = thing1, y = 2))
  ! do stuff with thing1 knowing y was 2
pattern (child_1(x = thing1, y = thing2))
  ! do stuff with thing1 and thing2
pattern (child_2(a = foo, b = bar, c = baz))
  ! do stuff with foo, bar and baz
pattern default
  ! if nothing matched
end select
end subroutine

similarly becomes

subroutine do_some_things(stuff)
class(base_t), intent(in) :: stuff
integer :: thing1, thing2
real :: foo, bar, baz

select type(stuff)
type is (child_1)
  if (stuff%y == 1) then
    thing1 = stuff%x
    ! do stuff with thing1 knowing y was 1
  else if (stuff%y == 2) then
    thing1 = stuff%x
    ! do stuff with thing1 knowing y was 2
  else
    thing1 = stuff%x; thing2 = stuff%y
    ! do stuff with thing1 and thing 2
  end if
type is (child_2)
  foo = stuff%a; bar = stuff%b; baz = stuff%c
  ! do stuff with foo, bar and baz
class default default
  ! if nothing matched
end select
end subroutine
everythingfunctional commented 3 years ago

Actually, I'd rather not have to declare the types of the destructured variables, and have it be more like

subroutine do_some_things(stuff)
class(base_t), intent(in) :: stuff

select type(stuff)
type is (child_1)
  if (stuff%y == 1) then
    associate(thing1 => (stuff%x))
      ! do stuff with thing1 knowing y was 1
    end associate
  else if (stuff%y == 2) then
    associate(thing1 => (stuff%x))
      ! do stuff with thing1 knowing y was 2
    end associate
  else
    associate(thing1 => (stuff%x); thing2 => (stuff%y))
      ! do stuff with thing1 and thing 2
    end associate
  end if
type is (child_2)
  associate(foo => (stuff%a); bar => (stuff%b); baz => (stuff%c))
    ! do stuff with foo, bar and baz
  end associate
class default default
  ! if nothing matched
end select
end subroutine

Also note that I placed the parens around each value, making them expressions, and thus the destructured variables are immutable.

14NGiestas commented 3 years ago

@everythingfunctional I think this is beyond the scope of this specific request, but it is really a pain to access the objects members in some cases and this overpowered select-case could shorten a lot of code logic, indeed. It deserves it's own issue thread, however. Personally, I miss some features to help access members, like a for-each iterator (do auto :: element in array), avoiding declaring integer sizes variables and nested array of objects in loops (context variables in general).

@certik We could also allow allocatable arrays in the unpack syntax. This way, one could easily implement a poor's man queue and stack.

..., allocatable :: array(:)
!! Some special unpacks
[pop_front, array] = array
[array, pop_back] = array
!! It would act like the "inverse operation" of the current standard:
array = [array, push_back]
array = [push_front, array]
milancurcic commented 3 years ago

At compile time the size is known, so this case is easily covered. At runtime we could just ensure that rhs has as many elements as lhs variables to unpack.

What happens if you're unpacking the result of a function that returns an allocatable array? Or would this not be allowed?

14NGiestas commented 3 years ago

At compile time the size is known, so this case is easily covered. At runtime we could just ensure that rhs has as many elements as lhs variables to unpack.

What happens if you're unpacking the result of a function that returns an allocatable array? Or would this not be allowed?

That would require a temporary :/

[x, y, z] = self % get_coords()
[pop, array] = self % get_array()
..., allocatable :: tmp
tmp = self % get_coords()
x = tmp(1);  y = tmp(2);  z = tmp(3);

..., allocatable :: tmp
tmp = self % get_array()
pop = tmp(1);  array = tmp(2:)
everythingfunctional commented 3 years ago

That would require a temporary :/

Well, technically it already requires a temporary, you just don't see it because the compiler takes care of it (although it can be optimized away in many cases). And if it's a language feature, the compiler can do what makes sense anyway. As a user, I'd be surprised if it didn't work.

As far as the additional pattern matching features being above and beyond the scope of the initial request, you're probably right, but hey, let's aim big.

urbanjost commented 2 years ago

In some cases it seems to be assumed the array and scalars use the same storage (ie. EQUIVALENCE or maybe ASSOCIATE) in others it seems to be an assignment to a declared variable. In the original post I was assuming that if this was called with an array that on exit the third element of the array would be changed in the calling procedure?

 subroutine ( [x, y, z])
    real :: x, y, z
    write(*,*)x**2, sqrt(y), (x+y)/z
    z=x*y
end subroutine

When I was first experimenting with polymorphic variables I made a little procedure that lets you assign values from an array to a bunch of scalar variables if they are common numeric intrinsic types which had a tedious amount of duplicated code in it that was interesting in that two class(*) variables were in the same expression. It is used like this ...

   program demo_set
   use,intrinsic :: iso_fortran_env, only : int8, int16, int32, int64
   use,intrinsic :: iso_fortran_env, only : real32, real64, real128
   use M_msg, only : set
   implicit none
   real(kind=real32)    :: a; namelist /all/a
   real(kind=real64)    :: b; namelist /all/b
   real(kind=real128)   :: c; namelist /all/c
   integer(kind=int8)   :: i; namelist /all/i
   integer(kind=int16)  :: j; namelist /all/j
   integer(kind=int32)  :: k; namelist /all/k
   integer(kind=int64)  :: l; namelist /all/l
   integer              :: iarr(7)=[1,2,3,40,50,600,700]
      call set(iarr,a,b,c,i,j,k,l)
      write(*,nml=all)
      call set([(123456789.0123456789d0,i=1,4)],a,b,c,l)
      write(*,nml=all)
   end program demo_set

Output

 &ALL
 A       =   1.000000    ,
 B       =   2.00000000000000     ,
 C       =   3.00000000000000000000000000000000      ,
 I       =    40,
 J       =      50,
 K       =           600,
 L       =                     700
 /
 &ALL
 A       =  1.2345679E+08,
 B       =   123456789.012346     ,
 C       =   123456789.012345671653747558593750      ,
 I       =    40,
 J       =      50,
 K       =           600,
 L       =             123456789
 /

That was interesting. I might revisit that now that more compilers work with it and I have used the feature more and maybe make it simpler.