Open Bellahra opened 2 months ago
There are some major issues with this code, let me highlight two:
MPI_Isend
and MPI_Irecv
) but you never checked if the communications completed (MPI_Wait*
or MPI_Test*
). Until they are completed you are not supposed to use (for the receiver) or alter (for the sender) the buffers used in nonblocking communications.MPI_Sendrecv_replace
.Additional suggestions for improving this code:
MPI_Irecv
followed by MPI_Isend
, to make sure all communications are expected on the receiver side.@bosilca Thank you very much for your kind and helpful reply. The original code works well after adding the MPI_Wait
operations. I have a question about the difference between the combination of MPI_Isend & MPI_Irecv
and MPI_Sendrecv_replace
. In the above code, since different ranks own different values of one variable, the replacement operation is done by using the same buffer and then overwriting it with the MPI_Irecv
operation. I am not clear about if there is any risk of doing so and the difference between this and using MPI_Sendrecv_replace
. Looking forward to your reply.
Even if the issues reported by @bosilca are addressed, I do not think this can work:
since subarrays are passed to MPI_Isend()
and MPI_Irecv()
, temporary flattened arrays are allocated by the Fortran runtime and deallocated when these subroutines return, which typically occurs before the data is sent or received, and hence undefined behavior which can be a crash.
Bottom line, subarrays should not be used with non blocking communications for now.
Note the MPI standard defines the MPI_SUBARRAYS_SUPPORTED
and MPI_ASYNC_PROTECTS_NONBLOCKING
"macros" and they are currently both .false.
under Open MPI.
@ggouaillardet Thank you for your suggestion. It really helped me understand the issue better.
Please submit all the information below so that we can understand the working environment that is the context for your question.
Background information
I want to exchange some data of derived data types between several ranks. When the sent data is a small array, the data can be sent and received successfully. But if I changed the array from e(2:2) to e(200:200) and sent 2(100,1:100), it showed errors. I didn't revise any other part but just the dimension of the array. It is so strange. I also tested if this problem occurs when the data type is double precision and found that it didn't.
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v4.0.3
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
source
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Details of the problem
The derived data type is
efield
andMPI_EFIELD
is the corresponding MPI_datatype. I use theMPI_Isend
andMPI_Irecv
to exchange the derived datae
between rank 0 and rank 1. It works well when I send a small array, like, e(2,2). However , when I handled a larger array, e(200:200), and sent e(1,1:100), it ran into errors and it seemed that the data were not exchanged. The first is the example code of a small array, i.e., e(2,2), and it was followed by the output:output:
This is the second code, where I only changed the dimensions of e and the count of sent data, and it is followed by the output,
output:
I also tested other cases when the data type is
double precision
, but it worked well. So I wondered what's the reason for this and how could I solve this problem.