kamping-site / kamping

KaMPIng: (Near) zero-overhead MPI wrapper for modern C++
https://kamping-site.github.io/kamping/
GNU Lesser General Public License v3.0
24 stars 0 forks source link

What do users want in an C++ MPI Interface? #537

Open lukashuebner opened 11 months ago

lukashuebner commented 11 months ago

https://github.com/mpi-forum/mpi-issues/issues/288

When you look at real-world MPI codes (say, PETSc, or in my case deal.II), one finds that maybe surprisingly, the number of MPI calls isn't actually very large. For example, in the 500k lines of deal.II, there are only ~100 MPI calls. A consequence of this is that the pain involved in using lower-level interfaces such as the MPI C bindings, is not too large. Conversely, one would not gain all that much by using higher level interfaces.

Still, oftentimes each project implements its own wrapper around those MPI calls it needs. MPI bugs are sometimes hard to catch (overlapping send and recv buffers). Sometimes, you'll end up writing less efficient code, because you're to lazy to do it right because it's too much typing (sparse all-to-all vs dense all-to-all). We provide additional features to plain MPI (sparse all-to-all).

My second observation is that many systems have multiple MPI libraries installed (different MPI implementations, or different versions). This poses a significant difficulty if you wanted to use, say, boost::mpi that don't just consist of header files: either there needs to be multiple installations of this package as well, or one needs to build it as part of the project that uses boost::mpi (but that's a problem in itself again, given that boost uses its own build system, which is unlike anything else).

We're header only and tested with multiple MPI versions.

It should be generic. Having to specify the data type of a variable is decidedly not C++-like. Of course, it also leads to errors. Elemental's MpiMap class would already be a nice first step (though I can't figure out why the heck the MpiMap::type variable isn't static const, so that it can be accessed without creating an object).

Check ✓ TODO: Does Elemental's MpiMap class implement something we could use?

It should have facilities for streaming arbitrary data types.

We don't have streaming yet ❌

Operations that require an MPI_Op argument (e.g., reductions) should integrate nicely with C++'s std::function interface, so that it's easy to just pass a function pointer (or a lambda!) rather than having to clumsily register something.

Check ✓

be header only

Check ✓

without any dependencies but , and the standard library,

Well ...

be generic and extensible

Check ✓

be non-blocking only (if you want to block, then block explicitly, not by default)

That's quite a change compared to the standard MPI behavior. I'd vote against it.

allow continuation-based chaining of non-blocking operations (chain's of .then().then())

    auto buffer = some_t{no_ranks};
    auto future = gather(comm, root(comm), my_offsets, buffer)
                  .then([&](){
                    /* when the gather is finished, this lambda will 
                       execute at the root node, and perform an expensive operation
                       there asynchronously (compute data required for load 
                       redistribution) whose result is broadcasted to the rest 
                       of the communicator */
                    return broadcast(comm, root(comm), buffer);
                  }).then([&]() {
                    /* when broadcast is finished, this lambda executes 
                       on all processes in the communicator, performing an expensive
                       operation asynchronously (redistribute the load, 
                       maybe using non-blocking point-to-point communication) */
                     return do_something_with(buffer);
                  }).then([&](auto result) {
                     /* finally perform a reduction on the result to check
                        everything went fine */
                     return all_reduce(comm, root(comm), result, 
                                      [](auto acc, auto v) { return acc && v; }); 
                  }).then([&](auto result) {
                      /* check the result at every process */
                      if (result) { return; /* we are done */ }
                      else {
                        root_only([](){ write_some_error_log(); });
                        throw some_exception;
                      }
                  });

    /* Here nothing has happened yet! */

    /* ... lots and lots of unrelated code that can execute concurrently 
       and overlaps with communication ... */

    /* When we now call future.get() we will block 
       on the whole chain (which might have finished by then!).
    */

    future.get();

I actually think that would be quite cool and a good selling point. It's a non-trivial abstraction above the MPI C-interface which simplifies the code written by the user substantially. An example usecase would be encoding/decoding of the data sent.

have zero abstraction penalty (i.e. be at least as fast as the C interface)

Check ✓

support extensible and efficient serialization (Boost.Fusion like, such that it works with RMA)

We don't have this yet.

have a strong DEBUG mode with tons of assertions

Check ✓

extremely type-safe (no more ints/void* for everything, heck I want tags to be types!)

What do they mean by tags should be types? Each tag should be his own type? Or there should be a type kamping::tag?

it should work with lambdas (e.g. all reduce + lambda)

Check ✓

use exceptions consistently as error-reporting and error-handling mechanism (no more error codes! no more function output arguments!)

Kind of? :D We decided that you cannot recover from MPI errors, right? Are there really none a user could sensibly recover from?

MPI-IO should offer a non-blocking I/O interface in the style of Boost.AFIO

We don't have MPI-IO support ❌

and just follow good modern C++ interface design practices (define regular types, non-member non-friend functions, play well with move semantics, support range operations, ...)

Check ✓ with post-modern named parameters :D

[abstract away buffer ownership]

Check ✓

It seems for a C++ user, allowing an interface that accepts C++20 ranges[*] could be quite useful (not using ranges from std:: but implementing it keeping the interface). But, this would require 'hiding' (hence maintaining) derived datatypes, so again I don't know if passing this responsibility to the C++ API is appropriate performance-wise (may require extra copies during scope transitions).

Check ✓

The interface should be able to eliminate redundant or unnecessary arguments, e.g. MPI_IN_PLACE

We don't have this specific example yet, but we do have default arguments.

As for Boost.MPI enhancements, adding support for nonblocking collectives, Mprobe/Mrecv, and neighborhood collectives is both important and straightforward.

We don't have this yet.

[Allow the user to specify that only parts of a class/struct should be serialized.]

→ Serialization not implemented yet

[Automatic serialization could be a footgun]

→ Make serialization at least somehow explicit. (Tag the respective class, provide a function like for cereal)

RAII for requests, communicators, etc. with unique ownership and move semantics. This also encompasses non-blocking semantics by having the destructor of a request wait on the request. Ignoring a returned request is equivalent to calling a blocking function.

Be careful, MPI_Finalized could already be called when an object gets destructed.

Some (most) of the above points where mentioned multiple times by different users. Especially handling of serialization and asynchronous communication with futures.

Projects mentioned in the thread:

niklas-uhl commented 4 months ago

Just dumping my own notes here after reading the whole thread:

niklas-uhl commented 4 months ago

It seems like people really want serialization, so we will implement that in #653.