trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.21k stars 566 forks source link

Tpetra::Details::iallreduce does not work with complex, if MPI_VERSION < 3 #859

Closed mhoemmen closed 5 years ago

mhoemmen commented 7 years ago

@trilinos/tpetra Story: #748

Tpetra::Details::iallreduce currently assumes that, given a Teuchos::EReductionType and any MPI_Datatype, that MPI_SUM and MPI_MAX work for that MPI_Datatype. This is only true if the MPI_Datatype is built in; it's not true if the MPI_Datatype is custom (derived).

We currently conservatively assume that the MPI_C_FLOAT_COMPLEX and MPI_C_DOUBLE_COMPLEX built-in MPI_Datatype only exist if MPI_VERSION >= 3. Tpetra::Details::MpiTypeTraits constructs derived MPI_Datatype for Kokkos::complex otherwise. This means that Tpetra::Details::iallreduce does not work with Kokkos::complex Views, when MPI_VERSION < 3. (MPI reports "invalid MPI_Op.")

As a temporary work-around, we could add a function that constructs the required custom MPI_Op on the fly for each MPI_Allreduce all (if MPI_VERSION >= 3, Tpetra::Details::iallreduce uses MPI_Iallreduce; otherwise, it uses MPI_Allreduce), and destroys it after use. It may make sense to cache the constructed MPI_Op; we could attach it to the MPI_Comm using MPI's (key,value) mechanism. That latter would ensure correct deallocation as well.

This issue only matters if users want to try pipelined Krylov methods with complex Scalar types.

rppawlo commented 7 years ago

@mhoemmen I just did an update of trilinos and building without complex gives me the following error. I think there are some missing ifdefs needed.

In file included from /ascldap/users/rppawlo/junk/Trilinos2/packages/tpetra/core/src/Tpetra_Details_iallreduce.hpp:65:0,
                 from /ascldap/users/rppawlo/junk/Trilinos2/packages/tpetra/core/src/Tpetra_Details_iallreduce.cpp:42:
/ascldap/users/rppawlo/junk/Trilinos2/packages/tpetra/core/src/Tpetra_Details_MpiTypeTraits.hpp: In function ‘MPI_Datatype Teuchos::Details::Impl::computeKokkosComplexMpiDatatype(const Kokkos::complex<RealType>&)’:
/ascldap/users/rppawlo/junk/Trilinos2/packages/tpetra/core/src/Tpetra_Details_MpiTypeTraits.hpp:129:28: error: ‘MyComplex’ was not declared in this scope
     static_assert (sizeof (MyComplex<T>) == sizeof ( ::Kokkos::complex<T>),
                            ^
/ascldap/users/rppawlo/junk/Trilinos2/packages/tpetra/core/src/Tpetra_Details_MpiTypeTraits.hpp:129:39: error: expected primary-expression before ‘>’ token
     static_assert (sizeof (MyComplex<T>) == sizeof ( ::Kokkos::complex<T>),
                                       ^
/ascldap/users/rppawlo/junk/Trilinos2/packages/tpetra/core/src/Tpetra_Details_MpiTypeTraits.hpp:129:40: error: expected primary-expression before ‘)’ token
     static_assert (sizeof (MyComplex<T>) == sizeof ( ::Kokkos::complex<T>),
                                        ^
/ascldap/users/rppawlo/junk/Trilinos2/packages/tpetra/core/src/Tpetra_Details_MpiTypeTraits.hpp:132:5: error: ‘MyComplex’ is not a member of ‘Teuchos::Details::Impl’
     ::Teuchos::Details::Impl::MyComplex<T> z2;
     ^
/ascldap/users/rppawlo/junk/Trilinos2/packages/tpetra/core/src/Tpetra_Details_MpiTypeTraits.hpp:132:42: error: expected primary-expression before ‘>’ token
     ::Teuchos::Details::Impl::MyComplex<T> z2;
                                          ^
/ascldap/users/rppawlo/junk/Trilinos2/packages/tpetra/core/src/Tpetra_Details_MpiTypeTraits.hpp:132:44: error: ‘z2’ was not declared in this scope
     ::Teuchos::Details::Impl::MyComplex<T> z2;
                                            ^
ninja: build stopped: subcommand failed.
[rppawlo@gge BUILD2]$ 
mhoemmen commented 7 years ago

Ah, I think I get what's going on here. I'll fix it.

mhoemmen commented 7 years ago

@rppawlo I just pushed a fix.

rppawlo commented 7 years ago

Thanks @mhoemmen !

mhoemmen commented 7 years ago

More general approach?

template<class Packet>
struct MpiOpTraits {
  static const bool isSpecialized = false;
  static const bool needsFree = false;
  static MPI_Op getSumOp () {
    return MPI_SUM; // replace if specializing & if needed
  }
};

Perhaps this belongs in MpiTypeTraits instead.

Another way to fix this would be to extend MpiTypeTraits so that it could represent Scalar with an MPI_Datatype and a count. For MPI_SUM, we could just treat std::complex (for example) as (MPI_DOUBLE, 2). This would also work for ensemble and automatic differentiation Scalar types. It wouldn't work so well with MPI_MAX, though.

@etphipp , what do you think? If we wanted to use actual MPI_Datatype and not rely on Teuchos::SerializationTraits for your Scalar types, would it be better to create a per-instance custom MPI_Datatype, or to represent the types as (MPI_Datatype, count) pairs? The only place where it really matters is with MPI_Op for reductions.

mhoemmen commented 7 years ago

I'm moving this to the backlog, since we don't have any customers asking for complex-arithmetic Pipelined CG or GMRES. We'll just have to remember to make exception-throwing stubs for the complex arithmetic case, for those solvers. (Belos already has examples for how to do that.)

mhoemmen commented 5 years ago

Trilinos doesn't have tests any more with MPI implementations that have MPI_VERSION < 3.

nmhamster commented 5 years ago

@mhoemmen - what machine was this on?

mhoemmen commented 5 years ago

@nmhamster This was with OpenMPI 1.6 (!), which is quite old and which nobody (not Sierra, not Trilinos) tests any more. I wouldn't worry about it. I was just going through old issues to look for issues related to PR https://github.com/trilinos/Trilinos/pull/5115 (check it out!) and found this one.

nmhamster commented 5 years ago

@mhoemmen - yeah that is old. I’m sure we are still using it somewhere :-). One less thing to worry about (thank you)!