kokkos / kokkos

Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
https://kokkos.org
Other
1.86k stars 420 forks source link

Kokkos::DualView subview modify flag question #2514

Open kddevin opened 4 years ago

kddevin commented 4 years ago

I'm trying to understand the semantics of DualView and subview.

Say I have a DualView X. Then I take a subview subX of X.

subX = subview(X, Kokkos::ALL(), range_type(0,10));

My understanding is that X and subX share the same modify flags; is that correct?

If so, what happens when modifications of X and subX are interspersed? Here are two examples:

1.  X.modify_...();  make changes in subX; X.sync_...()
2.  subX.modify_...(); make changes in X; make changes in subX; subX.sync...(); access X in synced location

I am guessing the first case would be safe, although would sync more than necessary.
I am guessing the second case would be wrong; changes made to X outside of subX would be lost. Is there logic for the modify flags that would prevent this usage?

Tpetra takes subviews of DualViews and passes them to the user in non-const Vectors. It is relying on the DualView modify flags to maintain correct copies. But if subviews share modify flags with DualViews, a user holding a non-const MultiVector and a non-const Vector subivewed from it could potentially experience some havoc.

I hope I'm wrong. Thanks for your help.

https://github.com/trilinos/Trilinos/issues/6158

crtrott commented 4 years ago

You are right this is a problem. Subviews of DualViews are pretty problematic. The only really useful scenarios which make sense are:

mhoemmen commented 4 years ago

Belos uses MultiVector as a kind of memory pool for Vectors. In that case, the children have full ownership of their local storage, and can sync and modify however they like.

Ifpack2 and Tpetra functions that take a MultiVector may need to call functions (e.g., third-party libraries) that only take one vector at a time. They may then call functions that work with the whole MultiVector (e.g., DistObject::doImport). If the child vectors aren't consistent with the parent MultiVectors (in terms of modify flags) before doImport, then doImport might clobber what the one-vector-at-a-time functions did. That's true even if the outer functions are "read only," because they might sync to get the data where they need to be read. (Note that Tpetra::MultiVector's methods have been changing over the last couple years to prefer running on device and sync there if necessary, in order to avoid slow host code. @csiefer2 has been working on this. It's not bad but it means that MultiVector methods might sync even in "read-only" methods.)

All this implies we need to know the context in order to know how to handle subviews.

crtrott commented 4 years ago

One option would be for tpetra (or whatever) subviews to always carry the original dual view with it. And use subviews locally for operations, while the syncing happens on the full thing.

mhoemmen commented 4 years ago

I feel like we're focusing a bit too much on how to do something, when we don't really know what we want to do. Let's think a bit about invariants, or behavior we know is wrong. It's bad if a sync clobbers outstanding data. Here are some examples:

  1. If I sync a DualView, then it's incorrect for the DualView to have been modified on the sync target. (There's only one DualView here, no parent or children.)
  2. If I sync a parent, and a child exists, then it's incorrect for the child to have been modified on the sync target. (For example: parent syncs from host to device, but child was modified on device.)
  3. If a parent has two overlapping children, then it's incorrect to sync any one child to $PLACE if either child (or the parent, but that's (3) below) has been modified on $PLACE.
  4. If I sync a child, then it's incorrect for the parent to have been modified on the sync target.
  5. If a parent has two _non_overlapping children, then sync'ing one child to (host,device) should not clobber the other child's modifications on (host,device). (See the Belos use case above.)

The current design of DualView should prevent (1), (2), (3), and (4). This is because all children share the modify flags with their parent. However, the current design of DualView would not prevent (5). I can think of ways to would prevent (5) as well, but they would require each parent (all the way down the tree) to track what subset of its rows and columns have outstanding (Multi)Vector views.

Do the examples above cover all the invariants we want to cover? Are there cases in which we don't care about them?