Open softwaretraff opened 2 years ago
Actually, OMPI does provide a similar capability at a lower level than the MPI API. I mention this in our discussion there, you can find the link above.
Hi George,
that's what I assumed. However, this doesn't help the application programmer. Supporting the 3-argument MPI_Reduce_locals operations should come at very little implementation effort.
Jesper
Hi again,
actually, the discussion in your link is great, and seems to support having a 3-argument MPI_Reduce_locals?
Jesper
Do we need a new MPI_Op_create for this to comply with the C function type?
Good point. Would be possible to do without, but would then cost for user Op's, so perhaps yes
Current MPI_OP_CREATE defines callbacks only for (invec,inoutvec) with elementwise inoutvec[i] = invec[i] o inoutvec[i]. The proposed MPI_REDUCE_LOCALS without MPI_INPLACE therefore always requires inoutbuf = argbuf ; user_defined_operation(inbuf, inoutbuf ) which is in contradiction to the performance goals of MPI. May be resolved by providing this new API only for predefinded operations. But then, a new inquiry function would be needed: MPI_OP_USERDEFINED(IN op, OUT userdefined).
This limits the effectiveness of MPI_Reduce_local for implementing own, collective reduction operations since it often makes it necessary to copy arguments around.
Then the user can implement two different algorithms for predefined and userdefined operations, the first one using the new MPI_REDUCE_LOCALS and the second one usingthe old MPI_REDUCE_LOCAL.
Dear Forum,
I still think a three argument MPI_Reduce_locals as outlined would be tremendously useful for those writing their own library reduction-like functions/collectives - the 2-argument MPI_Reduce_local in many cases forces unnecessary copying, especially if commutativity is not given/to be exploited. The proposal above should be extended by an MPI_Op_create for 3-argument user functions as well. I can provide a proposal/text, if there is interest in taking this to MPI 4.1
Jesper
Problem
The current MPI_Reduce_local operation (Section 6.9.7 of MPI-4.0) has severely restricted functionality: it "adds" an in-argument to an inout-argument in that order. It is thus not possible to directly "add" two different in-arguments with the result stored in an out-argument, neither is it possible to add two arguments in the order of inout-argument and then in-argument. This limits the effectiveness of MPI_Reduce_local for implementing own, collective reduction operations since it often makes it necessary to copy arguments around.
Proposal
It is proposed to add a 3-argument MPI_Reduce_locals to the standard which by permitting the use of MPI_IN_PLACE provides the full flexibility desirable for implementing own collective reduction operations.
Changes to the Text
MPI_REDUCE_LOCALS( inbuf, argbuf, inoutbuf, count, datatype, op) IN inbuf input buffer (choice) IN argbuf input buffer (choice) INOUT inoutbuf combined input and output buffer (choice) IN count number of elements in inbuf, argbuf and inoutbuf buf fers (nonnegative integer) IN datatype data type of elements of inbuf, argbuf and inoutb uf buffers (handle) IN op operation (handle)
int MPI_Reduce_locals(const void inbuf, const void argbuf, void* inoutbuf, int count, MPI_Datatype datatype, MPI_Op op)
MPI_Reduce_locals(inbuf, argbuf, inoutbuf, count, datatype, op, ierror) TYPE(), DIMENSION(..), INTENT(IN) :: inbuf TYPE(), DIMENSION(..), INTENT(IN) :: argbuf TYPE(*), DIMENSION(..) :: inoutbuf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Op), INTENT(IN) :: op INTEGER, OPTIONAL, INTENT(OUT) :: ierror MPI_REDUCE_LOCALS(INBUF, ARGBUF, INOUTBUF, COUNT, DATATYPE, OP, IERROR)