Open ctrueden opened 10 years ago
See also the NAR plugin for Maven as well as the SciJava native library loader for general solutions seeking to integrate native libraries with Java.
@ctrueden thanks for reminding me... @bnorthan I actually wanted to introduce you to the NAR project a little for the purpose of integrating native code into ImageJ plugins. Have a look at https://github.com/imagej/minimal-ij1-plugin/tree/native for example... And feel free to bombard me with questions!
https://github.com/bobpepin/YacuDecu
@bobpepin wrote this and it is licensed under lgpl. I ran some tests on it a while back and it seems to work pretty well. It would be a good starting point for a cuda decon op. @bobpepin wrote wrappers for matlab and imaris and said he'd be happy to see it in imagej eventually.
@StephanPreibisch As discussed at the hackathon, this issue may be of interest to you as well!
I have started to write a simple infrastructure for calling native and CUDA code here: https://github.com/fiji/SPIM_Registration/tree/master/src/main/java/spim/process/cuda
Two examples of CUDA implementation for separable and non-separable convolution are here, both are very useful for deconvolution: https://github.com/StephanPreibisch/FourierConvolutionCUDALib https://github.com/StephanPreibisch/SeparableConvolutionCUDALib
I think it would be great to have some common infrastructure for calling this kind of code.
Hi, in case you were thinking of implementing the main deconvolution loop in Java, I wanted to note that in my implementation I was able to cut GPU memory usage by 40% by using CUDA streaming and transferring the data needed for the next step in parallel with the FFT. These kinds of tricks might be a bit harder to do in a Java inner loop, or at least would require a Java interface to a substantial part of the CUDA API.
Cheers, Bob
On Oct 15, 2014, at 17:19, Stephan Preibisch notifications@github.com wrote:
I have started to write a simple infrastructure for calling native and CUDA code here: https://github.com/fiji/SPIM_Registration/tree/master/src/main/java/spim/process/cuda
Two examples of CUDA implementation for separable and non-separable convolution are here, both are very useful for deconvolution: https://github.com/StephanPreibisch/FourierConvolutionCUDALib https://github.com/StephanPreibisch/SeparableConvolutionCUDALib
I think it would be great to have some common infrastructure for calling this kind of code.
— Reply to this email directly or view it on GitHub.
in my implementation
@bobpepin is it publicly visible? Remember: unpublished work never happened, for all practical purposes.
https://github.com/bobpepin/YacuDecu
On Oct 15, 2014, at 17:56, dscho notifications@github.com wrote:
in my implementation
@bobpepin is it publicly visible? Remember: unpublished work never happened, for all practical purposes.
— Reply to this email directly or view it on GitHub.
https://github.com/bobpepin/YacuDecu
@bobpepin wrote a gpu deconvolution and it is licensed under lgpl. I ran some tests on it a while back and it seems to work pretty well. It would be a good starting point for a cuda decon op. Bob wrote wrappers for matlab and imaris and said he'd be happy to see it in imagej eventually.
Could the license be changed to BSD? Otherwise no problem, but then it will be eternally just an add-on to ImageJ...
LGPL was to encourage improvements to the library to be incorporated back into the main codebase also used by C/Matlab/Imaris interfaces. What about shipping the DLL/.so or source in a separate subdirectory and have the interface code be part of ImageJ under a BSD license, and contribute eventual changes to the cuda code back to the main yacudecu repository?
On 15 oct. 2014, at 19:02, dscho notifications@github.com wrote:
Could the license be changed to BSD? Otherwise no problem, but then it will be eternally just an add-on to ImageJ...
— Reply to this email directly or view it on GitHub.
Also, you might want to consider supporting OpenCL instead of or in addition to CUDA, since it supports nVidia, ATI and Intel cards. The biggest problem there was, last time I looked, that the publicly available FFT implementation had some limits on the input size, 2048 pixels in each dimension or something like that.
On Oct 15, 2014, at 19:18, Bob Pepin bobpepin@gmail.com wrote:
LGPL was to encourage improvements to the library to be incorporated back into the main codebase also used by C/Matlab/Imaris interfaces. What about shipping the DLL/.so or source in a separate subdirectory and have the interface code be part of ImageJ under a BSD license, and contribute eventual changes to the cuda code back to the main yacudecu repository?
On 15 oct. 2014, at 19:02, dscho notifications@github.com wrote:
Could the license be changed to BSD? Otherwise no problem, but then it will be eternally just an add-on to ImageJ...
— Reply to this email directly or view it on GitHub.
Just to let you all know there are now quite some OpenCL-based ops (proudly presented by @frauzufall - big thanks to Debo! ): https://github.com/clij/clij-ops
Based on https://clij.github.io/
Documentation can be found here: https://clij.github.io/clij-docs/clij_imagej_ops_java
Code examples can be found here: https://github.com/clij/clij-ops/tree/master/src/test/java/net/haesleinhuepf/clij/ops/examples
Give them a try and let us know what you think!
Cheers, Robert
Awesome! Thanks @frauzufall for working on this. If we have time, I'd like to show you the next iteration of the SciJava Ops framework while I am visiting.
Would it be feasible to name the ops so that they overload existing ops, rather than giving them new names? The idea would be to help people benefit from automatic performance improvements without needing to edit their scripts.
Hey @ctrueden ,
that sounds like a great idea. However, before automatically overloading Ops of different implementations, we should dig a bit deeper and find out why some implementations deliver different results. I would also strongly vote for automatic tests ensuring that different implementations deliver similar results up to a given tolerance. Just to get an idea of what I'm talking about:
This program suggests differences between Ops, CLIJ and ImageJ-legacy of different orders of magnitude: MSE (IJ ops vs legacy) = 0.001654734344482422 MSE (IJ legacy vs clij) = 1.72487557392742E-11 MSE (IJ ops vs clij) = 0.0016547824096679689
Let's have a chat about it in Dresden :-)
Cheers, Robert
Hi Curtis
It would be great if you could show us the next iteration of ops.
Correct me if I am wrong but it looks like these new Ops are typed on ClearCLBuffer and ClearCLImage. In fact, at least for the blur
ops there seem to be 3 ops using different combinations.
As an aside what is the difference between ClearCLBuffer and ClearCLImage??
There are a few scenarios that I think we need to consider if overloading existing ops.
Do we use the same names for the ops but use CLIJ specific types? In this case the user would have to convert types but could keep a lot of their code the same.
Do we use the same names and types? In this case you could write an op that does the conversion and calls the underlying CLIJ op and converts back. Or better yet just have converters.
If automatically converting input and output scenario 2 would be problematic for a series of operations. Would there be someway to transfer the data to the GPU but only retrieve it lazily, when the next java operation is performed??
What about Cuda ?? I have converters to CUDA, and would like to polish them at some point. It would be nice to be able to overload ops with both CLIJ and CUDA implementations.
What about data that is too large to fit on the GPU?? I've spent some time playing with Imglib2 cache as a means to retrieve data in chunks and send to the GPU.... Does CLIJ do any chunking??
Hey @bnorthan ,
1.-3. If possible, I would like to prevent automatic back-and-forth conversion because conversion takes time. GPU-acceleration is only beneficial, if long workflows are run on the GPU. That's why we initially thought automatic conversion shouldn't be enabled at all...
Looking forward to discuss details! :-)
At the beginning I started to match the CLIJ Ops with existing imagej-ops (here is the code), but there were differences and it is quite some work to find the counterparts (at least for me) so we decided as a first step to write clearly marked CLIJ ops returning the same results as CLIJ does in other scenarios.
I also wrote converters. You can try removing the CLIJ_push and CLIJ_pull op calls in the examples (jython, Java). It works in many cases, but sometimes fails to match the ClearCLBuffer to a RAI if the Op has additional input parameters.
I stopped going too much into detail / fixing things because I don't want to waste time debugging something that is being rewritten anyways. But the CLIJ Ops are perfect to test some core concepts of imagej-ops. Excited to hear about the next iteration!
We want to make implementing GPU-based ops as easy as possible. The glue code to execute GPU-based processing from Java is usually the same. The two main flavors to consider supporting are OpenCL and CUDA.
We can start by implementing a couple of GPU-based ops, and then factoring out common code into a shared type hierarchy. Due to the addition of dependencies for working with OpenCL and/or CUDA, we will likely need to create a new
imagej-ops-gpu
project (and/orimagej-ops-cuda
and/orimagej-ops-opencl
projects) which extendimagej-ops
.