When you look at real-world MPI codes (say, PETSc, or in my case deal.II), one finds that maybe surprisingly, the number of MPI calls isn't actually very large. For example, in the 500k lines of deal.II, there are only ~100 MPI calls. A consequence of this is that the pain involved in using lower-level interfaces such as the MPI C bindings, is not too large. Conversely, one would not gain all that much by using higher level interfaces.
Still, oftentimes each project implements its own wrapper around those MPI calls it needs. MPI bugs are sometimes hard to catch (overlapping send and recv buffers). Sometimes, you'll end up writing less efficient code, because you're to lazy to do it right because it's too much typing (sparse all-to-all vs dense all-to-all). We provide additional features to plain MPI (sparse all-to-all).
My second observation is that many systems have multiple MPI libraries installed (different MPI implementations, or different versions). This poses a significant difficulty if you wanted to use, say, boost::mpi that don't just consist of header files: either there needs to be multiple installations of this package as well, or one needs to build it as part of the project that uses boost::mpi (but that's a problem in itself again, given that boost uses its own build system, which is unlike anything else).
We're header only and tested with multiple MPI versions.
It should be generic. Having to specify the data type of a variable is decidedly not C++-like. Of course, it also leads to errors. Elemental's MpiMap class would already be a nice first step (though I can't figure out why the heck the MpiMap::type variable isn't static const, so that it can be accessed without creating an object).
Check ✓ TODO: Does Elemental's MpiMap class implement something we could use?
It should have facilities for streaming arbitrary data types.
We don't have streaming yet ❌
Operations that require an MPI_Op argument (e.g., reductions) should integrate nicely with C++'s std::function interface, so that it's easy to just pass a function pointer (or a lambda!) rather than having to clumsily register something.
Check ✓
be header only
Check ✓
without any dependencies but , and the standard library,
Well ...
be generic and extensible
Check ✓
be non-blocking only (if you want to block, then block explicitly, not by default)
That's quite a change compared to the standard MPI behavior. I'd vote against it.
allow continuation-based chaining of non-blocking operations (chain's of .then().then())
auto buffer = some_t{no_ranks};
auto future = gather(comm, root(comm), my_offsets, buffer)
.then([&](){
/* when the gather is finished, this lambda will
execute at the root node, and perform an expensive operation
there asynchronously (compute data required for load
redistribution) whose result is broadcasted to the rest
of the communicator */
return broadcast(comm, root(comm), buffer);
}).then([&]() {
/* when broadcast is finished, this lambda executes
on all processes in the communicator, performing an expensive
operation asynchronously (redistribute the load,
maybe using non-blocking point-to-point communication) */
return do_something_with(buffer);
}).then([&](auto result) {
/* finally perform a reduction on the result to check
everything went fine */
return all_reduce(comm, root(comm), result,
[](auto acc, auto v) { return acc && v; });
}).then([&](auto result) {
/* check the result at every process */
if (result) { return; /* we are done */ }
else {
root_only([](){ write_some_error_log(); });
throw some_exception;
}
});
/* Here nothing has happened yet! */
/* ... lots and lots of unrelated code that can execute concurrently
and overlaps with communication ... */
/* When we now call future.get() we will block
on the whole chain (which might have finished by then!).
*/
future.get();
I actually think that would be quite cool and a good selling point. It's a non-trivial abstraction above the MPI C-interface which simplifies the code written by the user substantially. An example usecase would be encoding/decoding of the data sent.
have zero abstraction penalty (i.e. be at least as fast as the C interface)
Check ✓
support extensible and efficient serialization (Boost.Fusion like, such that it works with RMA)
We don't have this yet.
have a strong DEBUG mode with tons of assertions
Check ✓
extremely type-safe (no more ints/void* for everything, heck I want tags to be types!)
What do they mean by tags should be types? Each tag should be his own type? Or there should be a type kamping::tag?
it should work with lambdas (e.g. all reduce + lambda)
Check ✓
use exceptions consistently as error-reporting and error-handling mechanism (no more error codes! no more function output arguments!)
Kind of? :D We decided that you cannot recover from MPI errors, right? Are there really none a user could sensibly recover from?
MPI-IO should offer a non-blocking I/O interface in the style of Boost.AFIO
We don't have MPI-IO support ❌
and just follow good modern C++ interface design practices (define regular types, non-member non-friend functions, play well with move semantics, support range operations, ...)
Check ✓ with post-modern named parameters :D
[abstract away buffer ownership]
Check ✓
It seems for a C++ user, allowing an interface that accepts C++20 ranges[*]
could be quite useful (not using ranges from std:: but implementing it
keeping the interface). But, this would require 'hiding' (hence
maintaining) derived datatypes, so again I don't know if passing this
responsibility to the C++ API is appropriate performance-wise (may require
extra copies during scope transitions).
Check ✓
The interface should be able to eliminate redundant or unnecessary arguments, e.g. MPI_IN_PLACE
We don't have this specific example yet, but we do have default arguments.
As for Boost.MPI enhancements, adding support for nonblocking collectives, Mprobe/Mrecv, and neighborhood collectives is both important and straightforward.
We don't have this yet.
[Allow the user to specify that only parts of a class/struct should be serialized.]
→ Serialization not implemented yet
[Automatic serialization could be a footgun]
→ Make serialization at least somehow explicit. (Tag the respective class, provide a function like for cereal)
RAII for requests, communicators, etc. with unique ownership and move semantics. This also encompasses non-blocking semantics by having the destructor of a request wait on the request. Ignoring a returned request is equivalent to calling a blocking function.
Be careful, MPI_Finalized could already be called when an object gets destructed.
Some (most) of the above points where mentioned multiple times by different users. Especially handling of serialization and asynchronous communication with futures.
Just dumping my own notes here after reading the whole thread:
generic: map variables to datatype
streaming of arbitrary datatypes
MPI_Op via std::function
problem of Boost.MPI: needs to be recompiled for each MPI implementation -> header only
extensible (whatever that means)
non-blocking only (blocking only explicitly)
chaining of non-blocking operators
extensible and efficient serialization
zero abstraction penalty
should be safe (e.g. destructor of non-ready future called -> std::terminate)
have a strong DEBUG mode with tons of assertions
type-safety
lambda-support for reduce
use exceptions consitently -> no more error codes, no output arguments
non-blocking MPI-IO
modern C++ (e.g. play well with move semantics, support range operations, ..)
use STL/Boost naming conventions
Personally, I don't really mind calling long C-style functions for the exact
reason Wolfgang mentioned; there are really few places you need to call them
and even then, they almost always get wrapped around by some higher-level
code.
This [reflections] would probably work for most POD / Trivial / StandardLayout types, but
isn't portable to types that don't need all members serialized. I think most
high-level C++-based APIs (thinking Charm++ and STAPL here, for instance) use
user-provided pack/unpack routines to do serialization. If we can find a
mechanism that allows users to easily select which fields of a class must be
serialized, that would probably be the way to go.
All this suggests that an interface that hides the details of MPI asynchrony might not be a zero-overhead abstraction.
return by value
C++ programs being what they are, they often want to send around unstructured
data of variable size. As I learned recently in deal.II, unstructured
some-to-some operations are not all that easy to implement. In essence, I'd
like to do something along the lines of sending a
std::map</target_rank=/int,/target_message=/T> around and receive a
std::map</sender_rank=/int,/sender_message=/T> back. I would already be
happy if that could be done with a C interface where T is one of the usual
MPI-supported data types of arrays thereof. So this is not really a C++
question as much as a request for a C interface where sender/receiver ranks
could be given as arrays and the messages through an array of pointers (or
maybe offsets into a long array of type T[]). But it surely would be nice to
get std::map objects.
constexpr datatypes
serialization is fundamental
More seriously, i think returning values (or expected) do not reflect what MPI
communication ultimately is, IO. In the IO picture, object exists (maybe in
unspecified but valid state) before communication. returning values forces
allocation even in cases where it is obvious it is not needed. (think of the
case of receiving into a vector that already has enough capacity to receive
the number of elements sent)
https://github.com/mpi-forum/mpi-issues/issues/288
Still, oftentimes each project implements its own wrapper around those MPI calls it needs. MPI bugs are sometimes hard to catch (overlapping send and recv buffers). Sometimes, you'll end up writing less efficient code, because you're to lazy to do it right because it's too much typing (sparse all-to-all vs dense all-to-all). We provide additional features to plain MPI (sparse all-to-all).
We're header only and tested with multiple MPI versions.
Check ✓ TODO: Does Elemental's MpiMap class implement something we could use?
We don't have streaming yet ❌
Check ✓
Check ✓
Well ...
Check ✓
That's quite a change compared to the standard MPI behavior. I'd vote against it.
I actually think that would be quite cool and a good selling point. It's a non-trivial abstraction above the MPI C-interface which simplifies the code written by the user substantially. An example usecase would be encoding/decoding of the data sent.
Check ✓
We don't have this yet.
Check ✓
What do they mean by tags should be types? Each tag should be his own type? Or there should be a type
kamping::tag
?Check ✓
Kind of? :D We decided that you cannot recover from MPI errors, right? Are there really none a user could sensibly recover from?
We don't have MPI-IO support ❌
Check ✓ with post-modern named parameters :D
Check ✓
Check ✓
We don't have this specific example yet, but we do have default arguments.
We don't have this yet.
→ Serialization not implemented yet
→ Make serialization at least somehow explicit. (Tag the respective class, provide a function like for cereal)
Be careful,
MPI_Finalized
could already be called when an object gets destructed.Some (most) of the above points where mentioned multiple times by different users. Especially handling of serialization and asynchronous communication with futures.
Projects mentioned in the thread: