sourceryinstitute / OpenCoarrays

A parallel application binary interface for Fortran 2018 compilers.
http://www.opencoarrays.org
BSD 3-Clause "New" or "Revised" License
244 stars 58 forks source link

runtime failure using derived types and multidimensional cobounds #488

Closed t-bltg closed 6 years ago

t-bltg commented 6 years ago
Avg response time
Issue Stats

Defect/Bug Report

Observed Behavior

On the following example, running on 8 images, the scattering from rank 1 fails when using a coarray in a derived type, whereas we observe no issue with pure coarrays.

log

# cafrun -np 8 ./scatter false
global size           8           6 
local size           2           3
 all close ? T

# cafrun -np 8 ./scatter true
 global size           8           6 
local size           2           3
Fortran runtime error on image 1: libcaf_mpi::caf_send_by_ref(): unable to allocate memory on remote image.

Expected Behavior

The following sequence should not fail.

Steps to Reproduce

script

#!/usr/bin/env bash
caf -o scatter scatter.f90

cafrun -np 8 ./scatter false
cafrun -np 8 ./scatter true

scatter.f90

program main
   implicit none

   type co_arr
      real, allocatable :: a(:, :)[:, :]
   end type

   type(co_arr) :: co1 ! derived type with coarray
   real, allocatable :: co2(:, :)[:, :] ! pure coarray
   real, allocatable :: glob(:, :), buf2d(:, :)
   integer, parameter :: nx = 2, ny = 3 ! local chunk, same over all images
   integer :: me, nimg, mx, my, i, j, k
   character(len = *), parameter :: tab = char(9), nl = char(10)
   character(len = 10) :: arg
   logical :: switch

   call getarg(1, arg)
   read (arg, '(L)') switch

   me = this_image()
   nimg = num_images()
   mx = nimg / 2 ! 2d grid distribution
   my = nimg / mx

   if (nimg /= 8) then
      stop 'example for 8 images'
   end if

   allocate(co1 % a(nx, ny)[mx, *])
   allocate(co2(nx, ny)[mx, *])
   allocate(buf2d(nx, ny), glob(mx * nx, my * ny))

   if (me == 1) print *, 'global size', [(mx * nx), (my * ny)], nl, 'local size', [nx, ny]

   ! call random_number(glob)
   glob = reshape([(i, i = 1, (mx * nx) * (my * ny))], shape(glob))

   sync all ! separate segments
   if (me == 1) then
      ! scatter glob from root (1st image)

      ! local to local
      co1 % a = glob(:nx, :ny)
      co2 = glob(:nx, :ny)

      ! loop over all other images
      do i = 1, mx
         do j = 1, my
            k = image_index(co2, [i, j])
            if (k /= 1) then
               buf2d = glob((i - 1) * nx + 1:i * nx, (j - 1) * ny + 1:j * ny) ! send buffer
               ! print *, 'filling up image #', k, '[', [i, j], ']', nl, tab, '==>', buf2d, nl
               if (switch) then
                  co1 % a(:nx, :ny)[i, j] = buf2d ! <= failure
               else
                  co2(:nx, :ny)[i, j] = buf2d ! ok
               end if
            end if
         end do
      end do
   end if
   sync all

   ! output local arrays after scattering
   ! print *, me, 'co1', co1 % a, nl, me, 'co2', co2

   ! collect results on rank 1
   call co_sum(co1 % a, result_image = 1)
   call co_sum(co2, result_image = 1)

   sync all
   if (me == 1) print *, 'all close ?', abs(sum(glob) - sum(merge(co1 % a, co2, switch))) < epsilon(0.)

   deallocate(co1 % a, co2, glob)
end program
vehre commented 6 years ago

This usually happens when the size of the array on the lhs co1 % a(:nx, :ny) is unequal to the one the right hand side of the assignment (buf2d) in this case. Try adding the lower_bound to the coarray-indexing ala %a(1:nx,1:ny) or the like.

t-bltg commented 6 years ago

Hi @vehre , thanks for the quick reply, I added explicit bounds, but it fails with the same issue.

I added -DEXTRA_DEBUG_OUTPUT to the build and here is the output

1/8: Entering send_by_ref(may_require_tmp = 0).
1/8: _gfortran_caf_send_by_ref() offset = 0, remote_mem = 0x7f9686bdc000
1/8: _gfortran_caf_send_by_ref() remote desc rank: 2 (ref_rank: 2)
1/8: _gfortran_caf_send_by_ref() remote desc dim[0] = (lb = 1, ub = 2, stride = 1)
1/8: _gfortran_caf_send_by_ref() remote desc dim[1] = (lb = 1, ub = 3, stride = 2)
1/8: _gfortran_caf_send_by_ref() extent(dst): 2 != delta: 3.
Fortran runtime error on image 1: libcaf_mpi::caf_send_by_ref(): unable to allocate memory on remote image.

Lower an upper bounds seem correct, the extent(dst) is 2 on dim == 0, but what is delta ?

vehre commented 6 years ago

delta is the number of elements on the rhs, which is questionable in this case, because it does not say anything about the dimension it is operating on currently. Unfortunately I am at work and can't take a more in-depth look currently.

t-bltg commented 6 years ago

I patched mpi_caf.c, the error arises on src_cur_dim = 0, so the number of elements on the rhs should be theoretically be 2, not 3.

Thanks for having a look into this, I'm pursuing investigation.

t-bltg commented 6 years ago

@vehre

There seems to be an issue when looping over the dimensions.

If delta is non scalar, then in_array_ref has to be set to true in order to increase the current dimension ++src_cur_dim.

The following patch seems to correct the issue for the aforementioned example, with all tests passing. I also ran the testsuite https://github.com/uhhpctools/caf-testsuite without any issue.

--- src/mpi/mpi_caf.c   2018-01-29 17:38:05.000000000 +0100
+++ src/mpi/mpi_caf_.c  2018-01-31 19:16:55.803766107 +0100
@@ -5371,7 +5371,6 @@
     bool extent_mismatch = false;
     /* Set when the first non-scalar array reference is encountered.  */
     bool in_array_ref = false;
-    bool array_extent_fixed = false;
     /* Set when remote data is to be accessed through the global dynamic window. */
     bool access_data_through_global_win = false;
     /* Set when the remote descriptor is to accessed through the global window. */
@@ -5571,6 +5570,7 @@
                     /* Do further checks, when the source is not scalar.  */
                     else if (delta != 1)
                       {
+                        in_array_ref = true;
                         /* When the realloc is required, then no extent may have
                            been set.  */
                         extent_mismatch = GFC_DESCRIPTOR_EXTENT (dst,
@@ -5612,10 +5612,7 @@
                 size *= (ptrdiff_t)delta;
               }
             if (in_array_ref)
-              {
-                array_extent_fixed = true;
-                in_array_ref = false;
-              }
+              in_array_ref = false;
             break;
           case CAF_REF_STATIC_ARRAY:
             for (i = 0; riter->u.a.mode[i] != CAF_ARR_REF_NONE; ++i)
@@ -5698,6 +5695,7 @@
                     /* Do further checks, when the source is not scalar.  */
                     else if (delta != 1)
                       {
+                        in_array_ref = true;
                         /* When the realloc is required, then no extent may have
                            been set.  */
                         extent_mismatch = GFC_DESCRIPTOR_EXTENT (dst,
@@ -5718,10 +5716,7 @@
                 size *= (ptrdiff_t)delta;
               }
             if (in_array_ref)
-              {
-                array_extent_fixed = true;
-                in_array_ref = false;
-              }
+              in_array_ref = false;
             break;
           default:
             caf_runtime_error (unknownreftype, stat, NULL, 0);

Please correct me if the reasoning is wrong !

zbeekman commented 6 years ago

@neok-m4700:

First of all, please let me say THANK YOU for the amazing bug report and sleuthing! We REALLY appreciate thorough and detailed bug reports like this! 💯 🙇 🥇

Second, may we adapt the code you posted above as a regression test for OpenCoarrays?

Third, I will look into applying your patch to the library, if you consent to it, and if I can verify its correctness.

Thanks once again, and sorry I was slow to get to this, work has been nuts recently!

t-bltg commented 6 years ago

Thanks, @zbeekman !

Yes sure, please use and modify the test case for non-regression.

OC seems mature enough for a research code, so I'm trying to port my own CFD code from a shared mem approach in C to Fortran for HPC runs under MPI, and coarrays seems to provide smooth transition for PGAS like programs, while covering the ugly bits of comms (cleaner code = better code).

So I might annoy you again with (hopefully only a few) bug reports in the following months :)

Especially the annoying

co % lhs = co % rhs
! Warning ! sendget_by_ref() is mostly unfunctional due to a design error

which has to be split into something like

allocate(buf(...))
buf = co % rhs
co % lhs = buf
deallocate(buf)

It's ok for now, as long as the program runs ...

Feel free to close the issue if you think it's resolved in master.

zbeekman commented 6 years ago

We are thrilled that using CAF in production research code appeals to you and will do everything in our power to try to make its adoption as painless as possible. I certainly agree it is much more pleasant to use than MPI.

Just a word of caution: some issues (compiler side mostly) persist with indirect addressing (vector subscripts) so structured grids should be fairly doable, but some issues that are relevant to unstructured grids may be a source of frustration. We’re addressing it as quickly as possible. Also coarray objects with pointer or allocatable components are a WIP and you may find some rough edges there.

Also, please keep in mind that much OC development is done by volunteers or thanks to monetary donations, by Sourcery Institute and a few others. Currently, however, development is unfunded. We welcome suggestions for funding opportunities and contributions in terms of pull requests. I think I can speak for @rouson and others when I say that we would be happy to jointly author grant proposals etc to better support your research through funding additional OC development.

Lack of resources aside, we will be as responsive as possible but certain issues could take some time to resolve without additional funding or source code contributions.

Cheers, Izaak “Zaak” Beekman

On Sat, Feb 3, 2018 at 12:08 PM neok-m4700 notifications@github.com wrote:

Thanks, @zbeekman https://github.com/zbeekman !

Yes sure, please use and modify the test case for non-regression.

OC seems mature enough for a research code, so I'm trying to port my own CFD code from a shared mem approach in C to Fortran for HPC runs under MPI, and coarrays seems to provide smooth transition for PGAS like programs, while covering the ugly bits of comms (cleaner code = better code).

So I might annoy you again with (hopefully only a few) bug reports in the following months :)

Especially the annoying

co % lhs = co % rhs! Warning ! sendget_by_ref() is mostly unfunctional due to a design error

which has to be split into something like

allocate(buf(...)) buf = co % rhs co % lhs = bufdeallocate(buf)

It's ok for now, as long as the program runs ...

Feel free to close the issue if you think it's resolved in master.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sourceryinstitute/OpenCoarrays/issues/488#issuecomment-362835047, or mute the thread https://github.com/notifications/unsubscribe-auth/AAREPDsM1zh-2UiPyJ6becNsbUz4Mu8Eks5tRJKOgaJpZM4RyTmE .

rouson commented 6 years ago

Yes.