mpi-forum / mpi-issues

Tickets for the MPI Forum
http://www.mpi-forum.org/
67 stars 7 forks source link

Predefined datatypes for MPI_Count and friends #109

Closed jdinan closed 5 years ago

jdinan commented 5 years ago

Problem

We don't have MPI predefined datatypes for MPI_Count, MPI_Aint, and the other opaque types (see pg. 683 of MPI 3.1).

Proposal

Add predefined datatypes for these MPI types so they can be communicated.

Changes to the Text

TBD

Impact on Implementations

Should be easy to implement.

Impact on Users

Happiness.

References

dholmes-epcc-ed-ac-uk commented 5 years ago

potentially others

Such as MPI_Group, MPI_Comm, MPI_Request? Interesting use-cases for serialising MPI opaque handles, sending them to other ranks, and de-serialising them into a working object.

I particularly like the idea of triggering a message transfer (e.g. Bcast) between some remote MPI processes using the MPI_Request handle for a persistent collective operation created by one of those MPI processes (e.g. root of Bcast) and sent to the local MPI process. I can't immediately think of a compelling use-case but standardising datatypes for "others" can only lead to good things.

dholmes-epcc-ed-ac-uk commented 5 years ago

The comment on #107 also applies to this issue: if MPI_COUNT and MPI_AINT are replaced with size_t and ptrdiff_t (and we get the datatype naming rule intended to cover the FP16 case) then no new pre-defined types are needed.

jeffhammond commented 5 years ago

Such as MPI_Group, MPI_Comm, MPI_Request? Interesting use-cases for serialising MPI opaque handles, sending them to other ranks, and de-serialising them into a working object.

I vigorously object to the intimation that these opaque handles have any meaning outside of the process where they were created.

If you take the Open-MPI model where these are pointers to the underlying state, they cannot have meaning in a different address space unless you use a symmetric heap allocator for all these handles, which is absurd.

Opaque means opaque. These cannot be serialized and deserialized.

tonyskjellum commented 5 years ago

Alas, I wanted to serialize and deserialize all objects in MPI 1 — if we could we could simplify many kinds of distributed computing problems for the Runtome aspects — it was not well received in 1993 — but why not revisit —- in fact the rule that you can’t hold a communicator to which you don’t belong arose from said line of discussion ...

Ex of ancient use case at the highest level : I serialize a group and send to you via out of band or through third party and then we find a way to communicate once you deserialize ... sound like it might work alongside sessions?
Useful for do your own collective work? Useful for making spawn more componentized?

Ex: I want to serialize and deserialize Mpi objects as part of the MPI stages model of fault tolerance ; now we make this as an extension ...

Regards Tony

Anthony Skjellum, PhD 205-807-4968

On Sep 22, 2018, at 12:57 AM, Dan Holmes notifications@github.com wrote:

The comment on #107 also applies to this issue: if MPI_COUNT and MPI_AINT are replaced with size_t and ptrdiff_t (and we get the datatype naming rule intended to cover the FP16 case) then no new pre-defined types are needed.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

mhoemmen commented 5 years ago

Wait, hold on, I know I'm a newbie here, but I see a big difference between serializing a group (really just a finite set of integers) and serializing a comm. Group is to comm like (path, offset) is to FILE* in C. FILE* doesn't just point to a path and an offset; it has local things like buffers that aren't useful to serialize.

I see this distinction between "unrealized potential thing" (e.g., group) and "realized active thing" (e.g., comm) as useful for fault tolerance. Realization queries availability of claimed resources. Opening a file asks whether the file is still there. Creating a comm means that MPI is in a sane enough state that it can turn a group into a comm.

bosilca commented 5 years ago

Objects where serialization/deserialization make sense already have such support (datatypes), or there is a way to achieve it under special circumstances (such as MPI_Group where one can always translate ranks to a common underlying communicator such as MPI_COMM_WORLD, or for dynamic cases some user-level "universe" communicator). For everything else I agree with @jeffhammond, the MPI handles should remains opaque to users.

jeffhammond commented 5 years ago

I agree that MPI_Group can be (de)serialized because it is really just an array of ranks, but why then does MPI need to provide a method for this? Users who want to (de)serialize these things can easily do so manually. Following the usual prescription, I'd like to see a use case where this is required and a comparison of the implementation of (de)serialization above and below MPI before making any decision about whether this is worthwhile.

jeffhammond commented 5 years ago

@jdinan Dumb question but is this proposal hetero-safe? Is MPI_Aint guaranteed to be the same size everywhere even when running on a collection of 32- and 64-bit systems?

tonyskjellum commented 5 years ago

Jeff — no it is not a dumb question —- but let me say that we should allow for features that only work in homogeneous systems moving forward as a choice — the great majority of MPI systems work homogeneously as you know — so we don’t want to exclude features that benefit such systems altogether — as we look to break backward compatibility potentially in MPI-4 for the benefit of future capability, we could also allow a class of implementation that is homogeneous or homogenous per group or such ... now I know we will have heterogenous elements in exascale but that still doesn’t mean that should cause closer coupling than between intercoms — in short it is not heterogenous imho clean to work with almost anything that is specified per process space like this as you know :-)

Regards Tony

Anthony Skjellum, PhD 205-807-4968

On Sep 23, 2018, at 6:51 AM, Jeff Hammond notifications@github.com wrote:

@jdinan Dumb question but is this proposal hetero-safe? Is MPI_Aint guaranteed to be the same size everywhere even when running on a collection of 32- and 64-bit systems?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

jeffhammond commented 5 years ago

@tonyskjellum My use of "dumb question but..." is often a rhetorical device rather than an actual dumb question 😉

In any case, I am open to the idea of deprecating heterogeneous support in MPI but we should start by doing that rather than introducing features that would break it.

jdinan commented 5 years ago

Opaque types listed on pg. 683 of MPI 3.1. I should know better than to file open-ended issues. 🤦‍♂️

raffenet commented 5 years ago

Just to be clear, MPI_AINT, MPI_COUNT, and MPI_OFFSET are defined for both C and Fortran. See pgs 673-674.

jdinan commented 5 years ago

@raffenet You're right. I'm not sure how we missed this in Barcelona. Sounds like this issue can be closed?