nlesc-sherlock / cluster-analysis

2 stars 4 forks source link

Create a simple GPU implementation for the NCC computation #15

Closed benvanwerkhoven closed 8 years ago

benvanwerkhoven commented 8 years ago

The NFI already created a highly optimized GPU implementation. Therefore this is something that is not essential. But since we do not have their source code, it would be nice to have a working implementation of our own. In case we have to recompute some of the NCC scores or aim to compute scores for bigger datasets. And nonetheless it is very nice exercise for anybody who wants to get more experienced with GPU Computing.

HannoSpreeuw commented 8 years ago

Would be great for my CUDA programming skills!

HannoSpreeuw commented 8 years ago

But I am not much of a Java person.

benvanwerkhoven commented 8 years ago

Fortunately the Java part is only the boilerplate host code which you can mostly copy/paste from the other parts of the application, so that won't be too much of a problem. Also I can help out with any Java problems you run into.

HannoSpreeuw commented 8 years ago

Cool!

benvanwerkhoven commented 8 years ago

I just found out that apparently you can not assign multiple people to an issue, that's a pity.

benvanwerkhoven commented 8 years ago

Great slide deck about optimizing reduce operations: https://docs.nvidia.com/cuda/samples/6_Advanced/reduction/doc/reduction.pdf

benvanwerkhoven commented 8 years ago

This has been implemented with great success. The GPU implementation is 14.5 times faster, including data transfers.