kokkos / kokkos-comm

Unofficial MPI Wrapper for Kokkos
https://kokkos.org/kokkos-comm/
Other
11 stars 9 forks source link

Implement Teuchos MPI operations #10

Open cwpearson opened 3 months ago

cwpearson commented 3 months ago

The motivation here is to prove we have a reasonable implementation by being able to run Trilinos on top of KokkosComm

MPI_... 1D non-contiguous
allreduce
broadcast
gather
scan
cedricchevalier19 commented 3 months ago

Very good starting point.

I have a comment concerning the API. Here we have 3 "blocking" send and only one "immediate". It might make sense to have only one of each and a template parameter to choose between the implementation.

send(snd_buf, comm); // Alias for send<Default>(snd_buff, comm);
send<Ready>(snd_buf, comm);
send<Synchronous>(snd_buf, comm);
send<Buffered>(snd_buf, comm);

This way we can also implement the isend variants. I think we should try to push users to I variants as they are a lot easier to use to make a semantically sound code, and also be ready for more asynchronous computations.

cwpearson commented 3 months ago

That's definitely an option, but since the Immediate version returns KokkosComm::Req and the rest do not, you actually have to do something like:

enum class Mode {
    Blocking,
    Buffered,
    Immediate,
    Ready,
    Synchronous
};

// most modes just return an error code
template <Mode mode>
struct SendReturn {
    using type = int;
}

// Immediate return value includes an MPI_Request
struct IsendResult {
    int err;
    MPI_Request req;
};
template<> struct SendReturn<Mode::Immediate> {
    using type = IsendResult;
}

template <Mode mode>
typename SendReturn<mode>::type send(...);

template<> typename SendReturn<Blocking>::type send<Blocking>(...) {...}
...

I think (though I might be missing something), this basically just replaces symbol lookup with template resolution in the compiler, and I'm not sure it really buys us anything

cwpearson commented 3 months ago

I see that I misunderstood a bit - you were just suggesting doing this for the synchronous versions. I think my point still stands (though we can avoid the funky return type template).

cedricchevalier19 commented 3 months ago

I think send and isend should be separate functions as they are semantically very different. But between variants they are quite interchangeable.

The idea of the template parameter with a default value is to be able to change this value for ease of checking communication pattern correctness, like force synchronous send instead of send to prevent deadlocks that can be hidden by a high eager limit on the test platform.

So basically, in pseudo-code

enum class Mode {
    Bare,
    Blocking,
    Ready,
    Synchronous
};

#ifdef DEBUG_COMM
template<Mode mode=Synchronous>
#else
template<Mode mode=Bare>
#endif
int send(View, Comm);

#ifdef DEBUG_COMM
template<Mode mode=Synchronous>
#else
template<Mode mode=Bare>
#endif
Req isend(View, Comm);

Or we can go crazier and have a CommPolicy object to set the default.

cwpearson commented 3 months ago

Maybe the thing to do right now while we consider this is put some relatively lean wrappers around MPI functions in the Impl namespace, since we'll almost certainly need those anyway.