Open mpiforumbot opened 8 years ago
Originally by jdinan on 2012-06-06 13:44:48 -0500
Attachment added: reduce_user_dt_and_op.c
(1.3 KiB)
Test case, which demonstrates memory consumption problem.
Originally by jhammond on 2014-09-09 04:57:17 -0500
We should also try to support MPI_IN_PLACE in user-defined reductions with this ticket. I'll add the text later.
Originally by jdinan on 2012-06-06 13:44:07 -0500
Description
Currently, using a derived datatype with an MPI reduction operation also requires the use of a user-defined MPI_Op. The function that implements an MPI_Op has the following C prototype:
void op_fcn(void in, void inout, int count, MPI_Datatype dtype)
Note that that user-define operations accept two buffers, but only one count and datatype. Because of this, both buffers must have the layout described by the count and datatype.
Consider a reduction on a column of a large row-major array. We can easily do a reduce operation directly on the column using an MPI vector datatype. Because this is not a built-in datatype, we must also provide a user-defined op to the reduction operation. The user-defined op expects all data to have the same layout because it takes only one datatype/count. Thus, MPI must reconstruct the sender's entire array before invoking the user-defined op, resulting in severe space inefficiency for this operation.
A test case is attached to the ticket that demonstrates the memory consumption issue.
Extended Scope
none.
History
none.
Proposed Solution
Define an MPI_Op that accepts one datatype for each buffer:
void op_fcn(void in, int count_in, MPI_Datatype dtype_in, void inout, int count_inout, MPI_Datatype dtype_inout)
This would allow MPI to pass one buffer in its packed form rather than recreating it's layout at the source.
This op could become challenging for a user to implement, thus it is necessary to investigate mechanisms to simplify this task. One possibility would be defining an op that takes two datatypes and one count. The MPI implementation would have to transform one or both datatypes to make individual units congruent. This seems doable for reductions since all processes must pass the same datatype.
Impact on Implementations
Impact on Applications and Users
Currently, reductions with derived datatypes are extremely inefficient. Fixing this issue would provide a significant performance enhancement.
Alternative Solutions
Several alternative solutions are possible: