On the 2019-03-07 Kokkos team meeting, we discussed the consistency of Kokkos::deep_copy behavior with respect to non-trivial copy constructors. There were about as many opinions as there were people in the room, and so I want to try and design a customization structure that has a shot at making everyone happy.
I still believe the default behavior should be to basically work like C++ normally works—if you're given a copy constructor (or move constructor), you should never just silently ignore it and do a memcpy instead. I think the conclusion we reached in the meeting should be the default if no other customizations are provided.
Beyond the default, there were basically two-ish different cases that came up: (1) a type that is not trivially copyable for whatever reason but needs to be treated as "mem-copyable" for the purposes of deep_copy, and (2 or 1b) a slightly more nuanced version of (1) where the copy constructor needs to do something when an extra copy is invoked during its captured by a Kokkos parallel pattern but can be memcpy'd in other circumstances (this is the Kokkos::View use case, and it becomes particularly pertinent if we want to start doing things beyond just turning off reference counting, like resilience as currently proposed). After some reflection, I would add a third case that we didn't discuss before: a type that needs to be treated as trivially copyable within a memory space but needs to do something special when deep copied across memory spaces.
I have a customization structure floating around in my head that should cleanly address all of these use cases. The first case is probably the easiest: a namespace scope trait that defers to an intrusive type alias if present and is otherwise false unless specialized:
The third case is also rather straightforward. In that case, we don't want to hook the copy constructor, but rather provide customization point that gets invoked only by deep copy. Something like:
namespace Kokkos {
namespace Experimental {
namespace Advanced {
template <class T, class DstSpace, class SrcSpace, class ExecSpace = typename DstSpace::execution_space>
T customize_deep_copy(T const&);
}}}
That customization point would do the normal customization point object thing where it delegates to a (in this case, static) member function if present and to the namespace scope function found via ADL otherwise (and does the normal memcpy if neither is found, as expected).
The second (or 1b) case is a bit trickier. We basically need a uniform way of telling any type with reference semantics to some other type (e.g., Kokkos::View<T>) that it is in the midst of a Kokkos kernel capture and should first convey that information to any types it references and then invoke the copy constructors if that type requests this behavior during capture. While this sounds like a complicated thing that only types internal to Kokkos should be thinking about, it is entirely reasonable that an advanced third-party library like RAJA or HPX may need to do similar things when interacting with Kokkos. Currently, we do this by setting a thread-local variable that only applies to tracking (because that's all we change), but it could just as easily (with some slight modifications, maybe) apply to anything else that needs to be done at this time. We'd need to slightly modify the way we implement the parallel_for function itself (so that we set the thread local flag, do a copy, and then unset the flag, rather than setting the flag for the entire duration of the kernel), but otherwise we wouldn't need to make any major changes other than exposing the flag in something like Kokkos::Advanced namespace and making the appropriate modifications to the special case of Kokkos::View<ThingThatCares*>. Opting in to this behavior could be done with a type trait. Long story short, this would provide a way for Kokkos::View<Kokkos::View<T>> to work properly with deep copy, and it would do so in a way that (very) advanced users could understand for their own types, if necessary.
Making this work correctly for an intermediate user who wants Kokkos::View<ThingThatHasViewDataMembers*> to "just work" is a slightly greater challenge. The Kokkos::View constructor would pick up on the special flag, but when it notes that the data type has not opted in to the special type trait, it doesn't call the copy constructor of ThingThatHasViewDataMembers, and thus the copy constructors of its data members are never called. Since normal copy construction of Kokkos::View is shallow, I don't think it's a good idea to call non-trivial copy constructors of the underlying data in any circumstance from the View copy constructor, even if it's just when this special thread-local flag is on. One option is to allow ThingThatHasViewDataMembers to opt in to this behavior using a type trait, intrusive alias, or base class, but none of these options are all that "friendly" for intermediate users.
However we choose to handle this final case, people will do this kind of thing, and so it needs to work, especially if we want to continue recommending that other projects (like the resilience work) use the copy constructor for their things.
On the 2019-03-07 Kokkos team meeting, we discussed the consistency of
Kokkos::deep_copy
behavior with respect to non-trivial copy constructors. There were about as many opinions as there were people in the room, and so I want to try and design a customization structure that has a shot at making everyone happy.I still believe the default behavior should be to basically work like C++ normally works—if you're given a copy constructor (or move constructor), you should never just silently ignore it and do a
memcpy
instead. I think the conclusion we reached in the meeting should be the default if no other customizations are provided.Beyond the default, there were basically two-ish different cases that came up: (1) a type that is not trivially copyable for whatever reason but needs to be treated as "mem-copyable" for the purposes of
deep_copy
, and (2 or 1b) a slightly more nuanced version of (1) where the copy constructor needs to do something when an extra copy is invoked during its captured by a Kokkos parallel pattern but can bememcpy
'd in other circumstances (this is theKokkos::View
use case, and it becomes particularly pertinent if we want to start doing things beyond just turning off reference counting, like resilience as currently proposed). After some reflection, I would add a third case that we didn't discuss before: a type that needs to be treated as trivially copyable within a memory space but needs to do something special when deep copied across memory spaces.I have a customization structure floating around in my head that should cleanly address all of these use cases. The first case is probably the easiest: a namespace scope trait that defers to an intrusive type alias if present and is otherwise false unless specialized:
(Name subject to bikesheding, of course.)
The third case is also rather straightforward. In that case, we don't want to hook the copy constructor, but rather provide customization point that gets invoked only by deep copy. Something like:
That customization point would do the normal customization point object thing where it delegates to a (in this case, static) member function if present and to the namespace scope function found via ADL otherwise (and does the normal
memcpy
if neither is found, as expected).The second (or 1b) case is a bit trickier. We basically need a uniform way of telling any type with reference semantics to some other type (e.g.,
Kokkos::View<T>
) that it is in the midst of a Kokkos kernel capture and should first convey that information to any types it references and then invoke the copy constructors if that type requests this behavior during capture. While this sounds like a complicated thing that only types internal to Kokkos should be thinking about, it is entirely reasonable that an advanced third-party library like RAJA or HPX may need to do similar things when interacting with Kokkos. Currently, we do this by setting a thread-local variable that only applies to tracking (because that's all we change), but it could just as easily (with some slight modifications, maybe) apply to anything else that needs to be done at this time. We'd need to slightly modify the way we implement theparallel_for
function itself (so that we set the thread local flag, do a copy, and then unset the flag, rather than setting the flag for the entire duration of the kernel), but otherwise we wouldn't need to make any major changes other than exposing the flag in something likeKokkos::Advanced
namespace and making the appropriate modifications to the special case ofKokkos::View<ThingThatCares*>
. Opting in to this behavior could be done with a type trait. Long story short, this would provide a way forKokkos::View<Kokkos::View<T>>
to work properly with deep copy, and it would do so in a way that (very) advanced users could understand for their own types, if necessary.Making this work correctly for an intermediate user who wants
Kokkos::View<ThingThatHasViewDataMembers*>
to "just work" is a slightly greater challenge. TheKokkos::View
constructor would pick up on the special flag, but when it notes that the data type has not opted in to the special type trait, it doesn't call the copy constructor ofThingThatHasViewDataMembers
, and thus the copy constructors of its data members are never called. Since normal copy construction ofKokkos::View
is shallow, I don't think it's a good idea to call non-trivial copy constructors of the underlying data in any circumstance from theView
copy constructor, even if it's just when this special thread-local flag is on. One option is to allowThingThatHasViewDataMembers
to opt in to this behavior using a type trait, intrusive alias, or base class, but none of these options are all that "friendly" for intermediate users.However we choose to handle this final case, people will do this kind of thing, and so it needs to work, especially if we want to continue recommending that other projects (like the resilience work) use the copy constructor for their things.