CUDA 10? - Githubissues

blueberry commented 6 years ago

Hi @jcuda

What are your plans regarding support for CUDA 10, and is there anything we can help with?

jcuda commented 6 years ago

Of course, the update for CUDA 10 is already on the radar, and I'll try to tackle it during this week. I'll drop notes here when it's ready to build the binaries, that partially depends on how may updates there are.

vimalaguti commented 6 years ago

what about other libraries like cuDNN 7.3?

jcuda commented 6 years ago

cuDNN was (and still is) a bit of an outsider, because it is only available for NVIDIA registered developers and thus not part of the CUDA SDK. But in the last few versions (I think since CUDA 9), the update for JCuda always included an update for the most recent cuDNN, and I'll try to do this again this time.

(But I have to mention that my actual "work" (i.e. the one that I'm paid for) is eating up a substantial amount of my time, so it's always hard to make promises here...)

jcuda commented 6 years ago

Just a short heads-up: I'm working on the update, but the https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__GRAPH.html#group__CUDA__GRAPH is bearing some challenges. I doubt that the functionality can sensibly be mapped to Java in all depth (I mean, they are passing around and storing (callback) function pointers there - we only have objects in Java, and managing the state of a callback is already tricky enough). I even considered to omit this part of the API to get the update to 10.0 out earlier. But I'm doing my best, let's see whether this is enough.

blueberry commented 6 years ago

Yes, I can imagine how graph API can get unwieldy really fast. Do you perhaps know how many people used nvGraph before? Have you looked into javacpp's way of dealing with this? They do have CUDA 10 API for nvGraph, but I don't know whether it supports the part that is giving the challenge at all.

jcuda commented 6 years ago

The API that is offered there is not a "usual graph API". (This is already covered with nvGraph and JNvgraph). This API is for defining "execution graphs". So you can define a graph like this

  A
 / \
B   C
 \ /
  D

where

A is a node that copies memory to the device
B and C are CUDA kernels that each operate on one half of the data
D is a kernel that combines the results from B and C

They offered some functionality that raises some issues. For example, you can define a "kernel node", with pseudocode like this

Pointer kernelArgs = { deviceInput, deviceResult };
CUgraphNode node = createNode(kernelFunction, kernelArgs);

and later obtain a pointer to the kernel arguments that have been given while the node was created - bam! That breaks everything for JCuda, because that once was a Java object, and the only thing that can be passed to the internal function for creating the node is a plain pointer. It's not possible to "reconstruct" the Java object from this pointer. Of course, one could return a CUdeviceptr at this point, but that seems to be very brittle and may cause unexpected behavior (and in any case, that's not the only issue here...)

JavaCPP might have covered this "implicitly". They have a far more "generic" mapping of Java objects to C data structures. JCuda internally works on structures that made sense for the specific API of CUDA. But a lot has happened in the last 10 years... (It would be time for a cleanup and refactoring, but when the update is done, I'd rather have a look at http://jdk.java.net/panama/ - maybe this comes for the rescue here ;-))

blueberry commented 6 years ago

I see. I mentioned JCuda because I’ve seen that they have the release, but I thought that this is about nvGraph. It is possible that they skipped this functionality.

jcuda commented 6 years ago

The updates for CUDA 10 have been committed to the individual projects.

The graph API is indeed difficult. And particularly: It does not make sense for the Runtime API in JCuda. So for the Runtime API, the graph functionality is omitted. (For the Driver API, it is basically available, and I created a very basic sample that I will add to the samples as soon as the update is finished. In fact, they don't even have graph API examples in the CUDA samples).

Another part that makes limited sense to port to Java is that for external memory. There are no "external (Java!?) semaphores" that could be mapped into JCuda, for that matter.

I'll add the corresponding release notes and some disclaimers regarding possible limitations of the graph API ASAP.

For now: If someone wants to give the linux build with CUDA 10.0 and cuDNN 7.4.1 a try, I'd be happy to use the resulting binaries. Otherwise, I'll try to do the packaging for Windows and Linux this week, and upload the packages to Maven Central as soon as possible.

blueberry commented 6 years ago

Thanks! I probably won't be able to do the Linux build before you estimated you'd do it yourself, unfortunately.

jcuda commented 6 years ago

JCuda 10.0.0 is on its way into Maven Central.

^{(Yes, that's 10.0.0, not 0.10.0...)}

It is intended for CUDA 10 and cuDNN 7.4.1.

As mentioned in the comments above, the execution graph API turns out to be difficult to map to Java, particularly for the Runtime API, where <<<kernel launches>>> are not supported. A very basic example showing what currently seems to work has been added as JCudaDriverBasicGraphExample.java

For JCudnn, the Recurrent Neural Networks example from NVIDIA has been ported to Java and been added as JCudnnRnnExample.java

I'll leave this issue open until the website etc. have been updated and until the first feedback arrives about whether or not the release works as expected on Linux (I could not really test it in my Virtual Machine).

blueberry commented 6 years ago

All ClojureCUDA and Neanderthal tests pass with JCuda 10.0.0.

blueberry commented 6 years ago

Thank you very much for releasing the new version with such a smooth transition!

jcuda commented 6 years ago

Great, thanks @blueberry , this always gives me some confidence!

The coverage with automated tests in JCuda is rather low: They basically consist of the "Basic JNI binding tests" and very few regression tests. I usually start the samples, which already should give a reasonable coverage, but automated ones would be preferable.

I'll update the website and README ASAP.

jcuda commented 5 years ago

This certainly was not "as soon as possible", but finally, the website and README have been updated.

jcuda / jcuda-main

CUDA 10? #27