nlesc-recruit / cudawrappers

C++ wrapper for the Nvidia C libraries (e.g. CUDA driver, nvrtc, cuFFT etc.)
https://cudawrappers.readthedocs.io/en/latest/
Apache License 2.0
5 stars 5 forks source link

Idea: Make `DeviceMemory` templated over a type `T` #261

Open stijnh opened 7 months ago

stijnh commented 7 months ago

Is your feature request related to a problem? Please describe. Currently, DeviceMemory (and also HostMemory and Array) just works on bytes. This can be error prone, for example, when performing a memcpy between two DeviceMemorys thats store data in a different type or when accidentally allocating n bytes (instead of n items, so n * sizeof(T) bytes)

Describe the solution you'd like It could be useful to make these classes typesafe and let them store n items of type T. This would help to make the memcpy functions more type safe.

Describe alternatives you've considered An alternative would be to add a new TypedDeviceMemory<T> type which internally stores a DeviceMemory to represents the data.

Additional context N/A

loostrum commented 4 months ago

@csbnw what do you think about this? I would find it quite useful to have type information embedded in DeviceMemory objects (and the same for HostMemory)

csbnw commented 4 months ago

For me, having the size in bytes worked just fine for any of the existing codes that uses cudawrappers. I can see the advantages, though.

Having both XMemory and TypedXMemory<T> doesn't seem ideal. How about adding the template argument to the existing classes and have char as default value?

I would rather not break compatibility with existing code, so ideally the constructor should remain unchanged. Maybe we just rename size to n, to denote the number of elements of type T?

stijnh commented 4 months ago

Using a default template argument like class DeviceMemory<typename T=char> could work. However users would need to explicitly use DeviceMemory<> (with an empty template argument list) to use the default type for T, so this will still break backwards compatibility.

Another alternative that I considered is to add a DeviceArray<T, N> class which stores an N-dimensional array of elements of type T. So the constructor would like N dimensions (one for each axis) and then automatically calculates the require number of bytes to store the array.