Open maartenarnst opened 5 months ago
Basically we are in somewhat of a bind here. In order to move across memory-space boundaries we may have no option other than to men-cpy but trivially-copyable is too harsh a condition. This is the same issue for which SYCL introduced its device copyable trait you can opt in too. Maybe we should consider doing this for Kokkos 5, i.e. require it. For now I don't want to guarantee different semantic behavior depending on the memory spaces you copy between. So effectively mem-cpy it is. We should document this properly though. There are other similar thorny issues. For example the functors we use must be "device copyable" because we effectively have to move them to some place on the GPU. That move is NOT done via a copy constructor. Furthermore that object is not destructed either.
And while I get the desire to treat this rigorously: practically no one has complained about it in 12 years of Kokkos deployment - and fixing it properly means a bunch of extra boilerplate everyone has to write or significant performance costs.
Here is a sketch for a fix:
Kokkos::mem_copyable
which users can specialize or which looks for Foo::kokkos_mem_copyabledeep_copy
would use assignment unless that trait is true, using a dual copy through device accessible buffer if necessary (i.e. we keep some HostPinnedSpace buffer around or so)All in all I would expect 10%-30% percent slow down for apps which don't do the proper opt-in based on what some of these mechanism cost (e.g. the large functor launch mechanism) and the extra cost in latency.
TODO:
And even with the opt-in, it is still easy to invoke language undefined behavior. Just because a user says a type is memcpyable doesn't make it so as far as the language is concerned.
Kokkos currently implements the
deep_copy
of aView
as a byte-wise copy under conditions summarized in this comment:These conditions are made rigorous in the code that follows in
Kokkos_CopyViews.hpp
by using type traits.It is notable that these conditions do not guard for the value type of the
View
to be trivially copyable. This current behavior ofdeep_copy
for non-trivially copyable value types may be surprising. Especially between assignable memory spaces, it may be expected that the copy constructor/copy assignment operator is called.The behavior of
std::copy
is different. It carries out the copy as a byte-wise copy only for trivially copyable value types.If this current behavior is not intended, a solution may consist of leading the case of non-trivially copyable value types to an implementation that uses the
ViewCopy
functor:We have tested this behavior by using a helper class that we can make non trivially copyable and that can count the number of times its special functions are called:
The code of the helper class, called
tester_t
in the snippet, is in attachment:Joint work with @romintomasetti. Issue created after a brief discussion with @dalg24.