Closed mhoemmen closed 8 years ago
When creating the View of the SerialDenseMatrix, first create a View with the stride (LDA) as the number of rows. Then, create a subview which has the right number of rows. The View should be LayoutLeft (column major).
Advantages of this approach:
There are a few other methods in BelosTpetraAdapter.hpp that refer to Teuchos::Comm explicitly. If we could remove those, we could save some includes in code that users see (because they must include BelosTpetraAdapter.hpp in order to use Belos with Tpetra objects).
I'm in the process of pushing the patch now. Note that I copied the data from the small replicated MultiVector to the SerialDenseMatrix (as the code was already doing in the sequential case). If this is a problem, we can always make it more efficient later :-)
You shouldn't actually need to do that if the MultiVector C is a view of the SerialDenseMatrix. It will just write directly to the SerialDenseMatrix's storage in that case; you don't need to copy. If you do need to copy, though, be sure to use Kokkos::deep_copy from the MultiVector's data. Also, make sure you're using the right version of the data (device vs. host). Thanks!
@trilinos/belos @amklinv
The Tpetra specialization of Belos::MultiVecTraits::MvTransMv uses Teuchos::REDUCE_SUM. This is the only MultiVecTraits method implementation for Tpetra that uses this enum. Either PETSc or Hypre appears to define a REDUCE_SUM macro, which causes build errors when colliding with this enum. While this is really an issue with PETSc or Hypre not namespacing their macros, we should be able to deal with this in Belos, and simplify the code at the same time.
Here is how you create the MultiVector to view the SerialDenseMatrix.
In the source code of Tpetra::MultiVector::multiply, this is Case 2 (where C is "local," i.e., globally replicated over the same communicator as the input MultiVectors). Belos does something weird: namely, it calls multiply on a MultiVector with a "SerialComm." Thus, the multiply() method doesn't actually get to do the communication that it knows how to do (it calls "reduce()" internally). Just let MultiVector::multiply do what it knows how to do, by letting C have a globally replicated Map with the same communicator as A and B.