Open stotko opened 4 years ago
In #391, a resource wrapper is introduced to allow for automatic memory management. Since this is highly related to the copy and resize support, future design decisions involving the wrapper will likely also affect this proposal. In particular, the C++ standard seems to introduce several non-owning reference classes (to which our containers are currently very similar):
string_view
atomic_ref
(implemented by stdgpu), span
mdspan
function_ref
(as of now)Thus, an alternative, yet similar proposal would be the following:
*_ref
/*_view
reference classes which include all functionality (except the factory functions) of the current containers, which effectively results in a renaming of all containers with an appropriate reference type suffix.device_unique_object
and additionally make them deep-copyable on the host only. Furthermore, add operator*
and operator->
functions for convenient conversion to their corresponding reference types.This would move the API much closer to the C++ standard while clearly communicating that crossing the host-device memory boundary is only safely doable via the similarly named reference types.
Up to now, the container classes have a fixed capacity and are created using the non-standard
createDeviceObject
factory function. Furthermore, since ease of use in GPU kernels is considered a key feature, the copy constructors are currently restricted to perform only shallow copies rather than deep copies. This behavior makes the container still feel non-standard and unintuitive to some degree, especially for new users.In order to fix both issues, the design of the copy operations needs to be revised to match the STL more closely. At first glance, this seems to be an easy task:
reference_wrapper<T>
class which can be used on the GPU.However, objects (or at least their states) need to be copied from CPU to GPU memory in order to allow for the proper execution of an operation. Since we want to make the containers work for as many backends and use cases as possible, we cannot make any assumptions how this transfer will be performed or whether this really requires calling the copy constructor or not.
reference_wrapper<T>
does not solve this problem since it points to the original object which lives in CPU memory.Therefore, the current proposal would be:
shallow_copy_wrapper<T>
class (suggestions for a better name are welcome) which wraps the object state. This class is copyable such that the object state can be easily passed to the GPU similar toreference_wrapper<T>
. However, if the state of the original object is changed, e.g. due to a resize operation, this change will not be visible or propagated to the wrapper invalidating it. Thus, we trade object consistency with GPU support.shallow_copy_wrapper<T>
is only intended to allow crossing memory boundaries and to enable container usage on the GPU. For CPU usage,std::reference_wrapper<T>
should be used instead if required.createDeviceObject
anddestroyDeviceObject
factory functions.This change will break existing usage within kernels and thrust algorithms (functors). A reasonable transition strategy would be to introduce
shallow_copy_wrapper<T>
in the last minor release of version 1 (which might be 1.3.0) and provide an option to disable the copy constructor and copy assignment operators. This way, users could start porting to the new copy model and will only need to move away from the factory functions in version 2.0.0.