Consider drawing on existing efforts

eyalroz commented 6 years ago

There are further-advanced efforts for enabling more elegant C++ code on the device-side.

One example is nVIDIA's Thrust libraries, which has a device-side vector class (although I'm not that much of a big fan; have you read about the span class as a more lightweight abstraction?).

Another example is part of my CUDA API wrappers. Specifically, compare your Grid.h with my launch_configuration_t class (doxygen, source code. The use cases are not identical, but their functionality mostly overlaps.

There's a lot more if you peek into my libgiddy's src/cuda folder. Actually, I've made some improvements to that utility code since; if you're interested, write me. I'll eventually update Giddy when I finish with some feature implementations.

mgopshtein commented 6 years ago

Hi Eyal,

I'm aware of Thrust and your library. The goal of my repository currently is to provide an illustration to the use of C++ in kernel code. Actually I started my first blog post with a reference to your library: https://migocpp.wordpress.com/2018/03/07/cuda/ :) BTW, I've been at your talk on the recent GTC Europe conference in Munich, a nice one indeed!

The Grid header file that you refer to is for the "indexing" post: https://migocpp.wordpress.com/2018/03/11/cuda-indexing/.

My impression from your "cuda-api-wrappers" library was that it wraps host APIs, but does not go into the kernel code too much - pls correct me if I missed something here. I didn't know about the "libgiddy", will definitely take a look on it. At least the title suggests that this is a functional library for decompression, and not a utility library like the 1st one.

Anyway, if you find it interesting, we can maybe work on integrating these concepts to your "wrappers" library, I think it makes perfect sense to cover device code in it as well.

Michael

On Sun, Mar 11, 2018 at 3:57 AM, Eyal Rozenberg notifications@github.com wrote:

There are further-advanced efforts for enabling more elegant C++ code on the device-side.

One example is nVIDIA's Thrust libraries https://developer.nvidia.com/thrust, which has a device-side vector class (although I'm not that much of a big fan; have you read about the span https://stackoverflow.com/q/45723819/1593077 class as a more lightweight abstraction?).

Another example is part of my CUDA API wrappers. Specifically, compare your Grid.h https://github.com/mgopshtein/cudacpp/blob/master/include/cudacpp/Grid.h with my launch_configuration_t class (doxygen https://codedocs.xyz/eyalroz/cuda-api-wrappers/structcuda_1_1launch__configuration__t.html, source code https://github.com/eyalroz/cuda-api-wrappers/blob/master/src/cuda/api/types.h#L151. The use cases are not identical, but their functionality mostly overlaps.

There's a lot more if you peek into my libgiddy https://github.com/eyalroz/libgiddy/'s src/cuda folder https://github.com/eyalroz/libgiddy/tree/master/src/cuda. Actually, I've made some improvements to that utility code since; if you're interested, write me. I'll eventually update Giddy when I finish with some feature implementations.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mgopshtein/cudacpp/issues/2, or mute the thread https://github.com/notifications/unsubscribe-auth/ARQwYdPvnJ8AuQQBVdF8BMUIkjDPKE43ks5tdIR7gaJpZM4SlhgY .

eyalroz commented 6 years ago

Hello Michael,

I'd continue this conversation via email rather than on this issue page...

So, the cuda-api-wrappers library is actually something split originally split off from my work on a DBMS-related kernel testing harness (and actual kernels). I've written quite a bit of "utility" code both for both the host and the device side, and some of the host-side code made it into the library. However, all of the fundamental types are shared, and they're very usable for kernel code as well. My libgiddy link illustrates this. On the other hand, I had little use for the dimensionality templates, since my work is on 1D kernels essentially exclusive; also, on the host side, I stuck to nVIDIA's abstractions mostly, so there's not fundamental difference between my launch_configuration_t and their {dim3, dim3, unsigned int} of information.

mgopshtein / cudacpp

Consider drawing on existing efforts #2