Belos::MultiVecTraits for Tpetra: Avoid using REDUCE_SUM in MvTransMv

mhoemmen commented 8 years ago

@trilinos/belos @amklinv

The Tpetra specialization of Belos::MultiVecTraits::MvTransMv uses Teuchos::REDUCE_SUM. This is the only MultiVecTraits method implementation for Tpetra that uses this enum. Either PETSc or Hypre appears to define a REDUCE_SUM macro, which causes build errors when colliding with this enum. While this is really an issue with PETSc or Hypre not namespacing their macros, we should be able to deal with this in Belos, and simplify the code at the same time.

Keep the branch of the code for the single vector case (Tpetra::MultiVector::multiply might not optimize for that case)
Otherwise, call Tpetra::MultiVector::multiply. View the input SerialDenseMatrix as a MultiVector and use a replicated Map.

Here is how you create the MultiVector to view the SerialDenseMatrix.

Create an empty Kokkos::DualView. Get device_type from Tpetra::MultiVector.
Create an HostSpace Kokkos::View (FIXME: Figure out how to specify at run time that the View is unmanaged, comparable to Teuchos::rcpFromRef -- I think you can just invoke the constructor that takes a raw pointer and dimensions, and everything will be ok).
Set the DualView's h_view field to the new View that you created in Step 2.
Mark the DualView as modified on host (actually you don't have to, because of UVM).
Give the DualView to Tpetra::MultiVector's constructor, along with a globally replicated Map.

In the source code of Tpetra::MultiVector::multiply, this is Case 2 (where C is "local," i.e., globally replicated over the same communicator as the input MultiVectors). Belos does something weird: namely, it calls multiply on a MultiVector with a "SerialComm." Thus, the multiply() method doesn't actually get to do the communication that it knows how to do (it calls "reduce()" internally). Just let MultiVector::multiply do what it knows how to do, by letting C have a globally replicated Map with the same communicator as A and B.

mhoemmen commented 8 years ago

When creating the View of the SerialDenseMatrix, first create a View with the stride (LDA) as the number of rows. Then, create a subview which has the right number of rows. The View should be LayoutLeft (column major).

mhoemmen commented 8 years ago

Advantages of this approach:

It lets Tpetra handle communication (by removing an explicit dependence on Teuchos::Comm)
The existing code copies the result into the SerialDenseMatrix C; this approach would save a copy

There are a few other methods in BelosTpetraAdapter.hpp that refer to Teuchos::Comm explicitly. If we could remove those, we could save some includes in code that users see (because they must include BelosTpetraAdapter.hpp in order to use Belos with Tpetra objects).

amklinv commented 8 years ago

I'm in the process of pushing the patch now. Note that I copied the data from the small replicated MultiVector to the SerialDenseMatrix (as the code was already doing in the sequential case). If this is a problem, we can always make it more efficient later :-)

mhoemmen commented 8 years ago

You shouldn't actually need to do that if the MultiVector C is a view of the SerialDenseMatrix. It will just write directly to the SerialDenseMatrix's storage in that case; you don't need to copy. If you do need to copy, though, be sure to use Kokkos::deep_copy from the MultiVector's data. Also, make sure you're using the right version of the data (device vs. host). Thanks!

trilinos / Trilinos

Belos::MultiVecTraits for Tpetra: Avoid using REDUCE_SUM in MvTransMv #37