Closed t-bltg closed 6 years ago
This usually happens when the size of the array on the lhs co1 % a(:nx, :ny) is unequal to the one the right hand side of the assignment (buf2d) in this case. Try adding the lower_bound to the coarray-indexing ala %a(1:nx,1:ny) or the like.
Hi @vehre , thanks for the quick reply, I added explicit bounds, but it fails with the same issue.
I added -DEXTRA_DEBUG_OUTPUT
to the build and here is the output
1/8: Entering send_by_ref(may_require_tmp = 0).
1/8: _gfortran_caf_send_by_ref() offset = 0, remote_mem = 0x7f9686bdc000
1/8: _gfortran_caf_send_by_ref() remote desc rank: 2 (ref_rank: 2)
1/8: _gfortran_caf_send_by_ref() remote desc dim[0] = (lb = 1, ub = 2, stride = 1)
1/8: _gfortran_caf_send_by_ref() remote desc dim[1] = (lb = 1, ub = 3, stride = 2)
1/8: _gfortran_caf_send_by_ref() extent(dst): 2 != delta: 3.
Fortran runtime error on image 1: libcaf_mpi::caf_send_by_ref(): unable to allocate memory on remote image.
Lower an upper bounds seem correct, the extent(dst) is 2 on dim == 0, but what is delta ?
delta is the number of elements on the rhs, which is questionable in this case, because it does not say anything about the dimension it is operating on currently. Unfortunately I am at work and can't take a more in-depth look currently.
I patched mpi_caf.c
, the error arises on src_cur_dim = 0
, so the number of elements on the rhs should be theoretically be 2, not 3.
Thanks for having a look into this, I'm pursuing investigation.
@vehre
There seems to be an issue when looping over the dimensions.
If delta is non scalar, then in_array_ref
has to be set to true in order to increase the current dimension ++src_cur_dim
.
The following patch seems to correct the issue for the aforementioned example, with all tests passing. I also ran the testsuite https://github.com/uhhpctools/caf-testsuite without any issue.
--- src/mpi/mpi_caf.c 2018-01-29 17:38:05.000000000 +0100
+++ src/mpi/mpi_caf_.c 2018-01-31 19:16:55.803766107 +0100
@@ -5371,7 +5371,6 @@
bool extent_mismatch = false;
/* Set when the first non-scalar array reference is encountered. */
bool in_array_ref = false;
- bool array_extent_fixed = false;
/* Set when remote data is to be accessed through the global dynamic window. */
bool access_data_through_global_win = false;
/* Set when the remote descriptor is to accessed through the global window. */
@@ -5571,6 +5570,7 @@
/* Do further checks, when the source is not scalar. */
else if (delta != 1)
{
+ in_array_ref = true;
/* When the realloc is required, then no extent may have
been set. */
extent_mismatch = GFC_DESCRIPTOR_EXTENT (dst,
@@ -5612,10 +5612,7 @@
size *= (ptrdiff_t)delta;
}
if (in_array_ref)
- {
- array_extent_fixed = true;
- in_array_ref = false;
- }
+ in_array_ref = false;
break;
case CAF_REF_STATIC_ARRAY:
for (i = 0; riter->u.a.mode[i] != CAF_ARR_REF_NONE; ++i)
@@ -5698,6 +5695,7 @@
/* Do further checks, when the source is not scalar. */
else if (delta != 1)
{
+ in_array_ref = true;
/* When the realloc is required, then no extent may have
been set. */
extent_mismatch = GFC_DESCRIPTOR_EXTENT (dst,
@@ -5718,10 +5716,7 @@
size *= (ptrdiff_t)delta;
}
if (in_array_ref)
- {
- array_extent_fixed = true;
- in_array_ref = false;
- }
+ in_array_ref = false;
break;
default:
caf_runtime_error (unknownreftype, stat, NULL, 0);
Please correct me if the reasoning is wrong !
@neok-m4700:
First of all, please let me say THANK YOU for the amazing bug report and sleuthing! We REALLY appreciate thorough and detailed bug reports like this! 💯 🙇 🥇
Second, may we adapt the code you posted above as a regression test for OpenCoarrays?
Third, I will look into applying your patch to the library, if you consent to it, and if I can verify its correctness.
Thanks once again, and sorry I was slow to get to this, work has been nuts recently!
Thanks, @zbeekman !
Yes sure, please use and modify the test case for non-regression.
OC seems mature enough for a research code, so I'm trying to port my own CFD code from a shared mem approach in C to Fortran for HPC runs under MPI, and coarrays seems to provide smooth transition for PGAS like programs, while covering the ugly bits of comms (cleaner code = better code).
So I might annoy you again with (hopefully only a few) bug reports in the following months :)
Especially the annoying
co % lhs = co % rhs
! Warning ! sendget_by_ref() is mostly unfunctional due to a design error
which has to be split into something like
allocate(buf(...))
buf = co % rhs
co % lhs = buf
deallocate(buf)
It's ok for now, as long as the program runs ...
Feel free to close the issue if you think it's resolved in master.
We are thrilled that using CAF in production research code appeals to you and will do everything in our power to try to make its adoption as painless as possible. I certainly agree it is much more pleasant to use than MPI.
Just a word of caution: some issues (compiler side mostly) persist with indirect addressing (vector subscripts) so structured grids should be fairly doable, but some issues that are relevant to unstructured grids may be a source of frustration. We’re addressing it as quickly as possible. Also coarray objects with pointer or allocatable components are a WIP and you may find some rough edges there.
Also, please keep in mind that much OC development is done by volunteers or thanks to monetary donations, by Sourcery Institute and a few others. Currently, however, development is unfunded. We welcome suggestions for funding opportunities and contributions in terms of pull requests. I think I can speak for @rouson and others when I say that we would be happy to jointly author grant proposals etc to better support your research through funding additional OC development.
Lack of resources aside, we will be as responsive as possible but certain issues could take some time to resolve without additional funding or source code contributions.
Cheers, Izaak “Zaak” Beekman
On Sat, Feb 3, 2018 at 12:08 PM neok-m4700 notifications@github.com wrote:
Thanks, @zbeekman https://github.com/zbeekman !
Yes sure, please use and modify the test case for non-regression.
OC seems mature enough for a research code, so I'm trying to port my own CFD code from a shared mem approach in C to Fortran for HPC runs under MPI, and coarrays seems to provide smooth transition for PGAS like programs, while covering the ugly bits of comms (cleaner code = better code).
So I might annoy you again with (hopefully only a few) bug reports in the following months :)
Especially the annoying
co % lhs = co % rhs! Warning ! sendget_by_ref() is mostly unfunctional due to a design error
which has to be split into something like
allocate(buf(...)) buf = co % rhs co % lhs = bufdeallocate(buf)
It's ok for now, as long as the program runs ...
Feel free to close the issue if you think it's resolved in master.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sourceryinstitute/OpenCoarrays/issues/488#issuecomment-362835047, or mute the thread https://github.com/notifications/unsubscribe-auth/AAREPDsM1zh-2UiPyJ6becNsbUz4Mu8Eks5tRJKOgaJpZM4RyTmE .
Yes.
Defect/Bug Report
4cc41bf4f301190c (1.9.3+)
GNU Fortran (GCC) 7.3.0 (release)
gcc (GCC) 7.3.0 (release)
cmake ..; make -j; make install
uname -a
:Linux <...> 4.13.0-32-generic #35-Ubuntu SMP Thu Jan 25 09:13:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
mpich 3.2.1 (release)
OpenMPI 3.0.0 (release)
x86_64 12 cores
cmake version 3.10.2 (release)
Observed Behavior
On the following example, running on 8 images, the scattering from rank 1 fails when using a coarray in a derived type, whereas we observe no issue with pure coarrays.
log
Expected Behavior
The following sequence should not fail.
Steps to Reproduce
script
scatter.f90