Container: resize and copy support

Up to now, the container classes have a fixed capacity and are created using the non-standard createDeviceObject factory function. Furthermore, since ease of use in GPU kernels is considered a key feature, the copy constructors are currently restricted to perform only shallow copies rather than deep copies. This behavior makes the container still feel non-standard and unintuitive to some degree, especially for new users.

In order to fix both issues, the design of the copy operations needs to be revised to match the STL more closely. At first glance, this seems to be an easy task:

Define the copy constructors and copy assignment operators to perform deep copies.
Provide a reference_wrapper<T> class which can be used on the GPU.

However, objects (or at least their states) need to be copied from CPU to GPU memory in order to allow for the proper execution of an operation. Since we want to make the containers work for as many backends and use cases as possible, we cannot make any assumptions how this transfer will be performed or whether this really requires calling the copy constructor or not. reference_wrapper<T> does not solve this problem since it points to the original object which lives in CPU memory.

Therefore, the current proposal would be:

Provide a shallow_copy_wrapper<T> class (suggestions for a better name are welcome) which wraps the object state. This class is copyable such that the object state can be easily passed to the GPU similar to reference_wrapper<T>. However, if the state of the original object is changed, e.g. due to a resize operation, this change will not be visible or propagated to the wrapper invalidating it. Thus, we trade object consistency with GPU support.
Define the copy constructors and copy assignment operators to perform deep copies, but restrict them to be callable from the host only.
Clearly document that shallow_copy_wrapper<T> is only intended to allow crossing memory boundaries and to enable container usage on the GPU. For CPU usage, std::reference_wrapper<T> should be used instead if required.
Deprecate/remove the createDeviceObject and destroyDeviceObject factory functions.

This change will break existing usage within kernels and thrust algorithms (functors). A reasonable transition strategy would be to introduce shallow_copy_wrapper<T> in the last minor release of version 1 (which might be 1.3.0) and provide an option to disable the copy constructor and copy assignment operators. This way, users could start porting to the new copy model and will only need to move away from the factory functions in version 2.0.0.

In #391, a resource wrapper is introduced to allow for automatic memory management. Since this is highly related to the copy and resize support, future design decisions involving the wrapper will likely also affect this proposal. In particular, the C++ standard seems to introduce several non-owning reference classes (to which our containers are currently very similar):

C++17: string_view
C++20: atomic_ref (implemented by stdgpu), span
C++23: mdspan
C++26: function_ref (as of now)

Thus, an alternative, yet similar proposal would be the following:

Introduce *_ref/*_view reference classes which include all functionality (except the factory functions) of the current containers, which effectively results in a renaming of all containers with an appropriate reference type suffix.
Make the actual container classes automatically manage their memory similar to device_unique_object and additionally make them deep-copyable on the host only. Furthermore, add operator* and operator-> functions for convenient conversion to their corresponding reference types.

This would move the API much closer to the C++ standard while clearly communicating that crossing the host-device memory boundary is only safely doable via the similarly named reference types.

stotko / stdgpu

Container: resize and copy support #87