What do users want in an C++ MPI Interface?

https://github.com/mpi-forum/mpi-issues/issues/288

When you look at real-world MPI codes (say, PETSc, or in my case deal.II), one finds that maybe surprisingly, the number of MPI calls isn't actually very large. For example, in the 500k lines of deal.II, there are only ~100 MPI calls. A consequence of this is that the pain involved in using lower-level interfaces such as the MPI C bindings, is not too large. Conversely, one would not gain all that much by using higher level interfaces.

Still, oftentimes each project implements its own wrapper around those MPI calls it needs. MPI bugs are sometimes hard to catch (overlapping send and recv buffers). Sometimes, you'll end up writing less efficient code, because you're to lazy to do it right because it's too much typing (sparse all-to-all vs dense all-to-all). We provide additional features to plain MPI (sparse all-to-all).

My second observation is that many systems have multiple MPI libraries installed (different MPI implementations, or different versions). This poses a significant difficulty if you wanted to use, say, boost::mpi that don't just consist of header files: either there needs to be multiple installations of this package as well, or one needs to build it as part of the project that uses boost::mpi (but that's a problem in itself again, given that boost uses its own build system, which is unlike anything else).

We're header only and tested with multiple MPI versions.

It should be generic. Having to specify the data type of a variable is decidedly not C++-like. Of course, it also leads to errors. Elemental's MpiMap class would already be a nice first step (though I can't figure out why the heck the MpiMap::type variable isn't static const, so that it can be accessed without creating an object).

Check ✓ TODO: Does Elemental's MpiMap class implement something we could use?

It should have facilities for streaming arbitrary data types.

We don't have streaming yet ❌

Operations that require an MPI_Op argument (e.g., reductions) should integrate nicely with C++'s std::function interface, so that it's easy to just pass a function pointer (or a lambda!) rather than having to clumsily register something.

Check ✓

be header only

Check ✓

without any dependencies but , and the standard library,

Well ...

be generic and extensible

Check ✓

be non-blocking only (if you want to block, then block explicitly, not by default)

That's quite a change compared to the standard MPI behavior. I'd vote against it.

allow continuation-based chaining of non-blocking operations (chain's of .then().then())

    auto buffer = some_t{no_ranks};
    auto future = gather(comm, root(comm), my_offsets, buffer)
                  .then([&](){
                    /* when the gather is finished, this lambda will 
                       execute at the root node, and perform an expensive operation
                       there asynchronously (compute data required for load 
                       redistribution) whose result is broadcasted to the rest 
                       of the communicator */
                    return broadcast(comm, root(comm), buffer);
                  }).then([&]() {
                    /* when broadcast is finished, this lambda executes 
                       on all processes in the communicator, performing an expensive
                       operation asynchronously (redistribute the load, 
                       maybe using non-blocking point-to-point communication) */
                     return do_something_with(buffer);
                  }).then([&](auto result) {
                     /* finally perform a reduction on the result to check
                        everything went fine */
                     return all_reduce(comm, root(comm), result, 
                                      [](auto acc, auto v) { return acc && v; }); 
                  }).then([&](auto result) {
                      /* check the result at every process */
                      if (result) { return; /* we are done */ }
                      else {
                        root_only([](){ write_some_error_log(); });
                        throw some_exception;
                      }
                  });

    /* Here nothing has happened yet! */

    /* ... lots and lots of unrelated code that can execute concurrently 
       and overlaps with communication ... */

    /* When we now call future.get() we will block 
       on the whole chain (which might have finished by then!).
    */

    future.get();

I actually think that would be quite cool and a good selling point. It's a non-trivial abstraction above the MPI C-interface which simplifies the code written by the user substantially. An example usecase would be encoding/decoding of the data sent.

have zero abstraction penalty (i.e. be at least as fast as the C interface)

Check ✓

support extensible and efficient serialization (Boost.Fusion like, such that it works with RMA)

We don't have this yet.

have a strong DEBUG mode with tons of assertions

Check ✓

extremely type-safe (no more ints/void* for everything, heck I want tags to be types!)

What do they mean by tags should be types? Each tag should be his own type? Or there should be a type kamping::tag?

it should work with lambdas (e.g. all reduce + lambda)

Check ✓

use exceptions consistently as error-reporting and error-handling mechanism (no more error codes! no more function output arguments!)

Kind of? :D We decided that you cannot recover from MPI errors, right? Are there really none a user could sensibly recover from?

MPI-IO should offer a non-blocking I/O interface in the style of Boost.AFIO

We don't have MPI-IO support ❌

and just follow good modern C++ interface design practices (define regular types, non-member non-friend functions, play well with move semantics, support range operations, ...)

Check ✓ with post-modern named parameters :D

[abstract away buffer ownership]

Check ✓

It seems for a C++ user, allowing an interface that accepts C++20 ranges[*] could be quite useful (not using ranges from std:: but implementing it keeping the interface). But, this would require 'hiding' (hence maintaining) derived datatypes, so again I don't know if passing this responsibility to the C++ API is appropriate performance-wise (may require extra copies during scope transitions).

Check ✓

The interface should be able to eliminate redundant or unnecessary arguments, e.g. MPI_IN_PLACE

We don't have this specific example yet, but we do have default arguments.

As for Boost.MPI enhancements, adding support for nonblocking collectives, Mprobe/Mrecv, and neighborhood collectives is both important and straightforward.

We don't have this yet.

[Allow the user to specify that only parts of a class/struct should be serialized.]

→ Serialization not implemented yet

[Automatic serialization could be a footgun]

→ Make serialization at least somehow explicit. (Tag the respective class, provide a function like for cereal)

RAII for requests, communicators, etc. with unique ownership and move semantics. This also encompasses non-blocking semantics by having the destructor of a request wait on the request. Ignoring a returned request is equivalent to calling a blocking function.

Be careful, MPI_Finalized could already be called when an object gets destructed.

Some (most) of the above points where mentioned multiple times by different users. Especially handling of serialization and asynchronous communication with futures.

Projects mentioned in the thread:

Just dumping my own notes here after reading the whole thread:

generic: map variables to datatype
streaming of arbitrary datatypes
MPI_Op via std::function
problem of Boost.MPI: needs to be recompiled for each MPI implementation -> header only
extensible (whatever that means)
non-blocking only (blocking only explicitly)
chaining of non-blocking operators
extensible and efficient serialization
zero abstraction penalty
should be safe (e.g. destructor of non-ready future called -> std::terminate)
have a strong DEBUG mode with tons of assertions
type-safety
lambda-support for reduce
use exceptions consitently -> no more error codes, no output arguments
non-blocking MPI-IO
modern C++ (e.g. play well with move semantics, support range operations, ..)
use STL/Boost naming conventions
Personally, I don't really mind calling long C-style functions for the exact reason Wolfgang mentioned; there are really few places you need to call them and even then, they almost always get wrapped around by some higher-level code.
This [reflections] would probably work for most POD / Trivial / StandardLayout types, but isn't portable to types that don't need all members serialized. I think most high-level C++-based APIs (thinking Charm++ and STAPL here, for instance) use user-provided pack/unpack routines to do serialization. If we can find a mechanism that allows users to easily select which fields of a class must be serialized, that would probably be the way to go.
All this suggests that an interface that hides the details of MPI asynchrony might not be a zero-overhead abstraction.
return by value
C++ programs being what they are, they often want to send around unstructured data of variable size. As I learned recently in deal.II, unstructured some-to-some operations are not all that easy to implement. In essence, I'd like to do something along the lines of sending a std::map</target_rank=/int,/target_message=/T> around and receive a std::map</sender_rank=/int,/sender_message=/T> back. I would already be happy if that could be done with a C interface where T is one of the usual MPI-supported data types of arrays thereof. So this is not really a C++ question as much as a request for a C interface where sender/receiver ranks could be given as arrays and the messages through an array of pointers (or maybe offsets into a long array of type T[]). But it surely would be nice to get std::map objects.
constexpr datatypes
serialization is fundamental
More seriously, i think returning values (or expected) do not reflect what MPI communication ultimately is, IO. In the IO picture, object exists (maybe in unspecified but valid state) before communication. returning values forces allocation even in cases where it is obvious it is not needed. (think of the case of receiving into a vector that already has enough capacity to receive the number of elements sent)
syntactic sugar
- tag 0 by default

kamping-site / kamping

What do users want in an C++ MPI Interface? #537