maweigert / gputools

GPU accelerated image/volume processing in Python
BSD 3-Clause "New" or "Revised" License
108 stars 20 forks source link

Feature request: integral image #15

Open jni opened 5 years ago

jni commented 5 years ago

We have some issues with our naive (non-parallel) integral image implementation in scikit-image. At the same time, we have many functions (template matching, thresholding) that depend on it, so it would be awesome to have a GPUtools implementation. (Ideally before March. ;) Here are some references for the implementation:

https://dspace.mit.edu/openaccess-disseminate/1721.1/71883

(note though that they use exclusive integral images while we use inclusive ones in skimage):

In [1]: from skimage.transform import integral_image
In [2]: import numpy as np
In [3]: integral_image(np.array([[4, 8, 0], [9, 8, 8]]))
Out[3]: 
array([[ 4, 12, 12],
       [13, 29, 37]])

It might also turn out to be way more complicated in nD...

Other relevant paper:

https://ieeexplore.ieee.org/document/8249556

maweigert commented 5 years ago

Thanks for the pointers! I remember that I already implemented something along these lines some time ago (i.e. trivially porting the NVIDIA samples to opencl). Reading the issue you linked, the tricky part might be to incorporate Kahan summation into the parallel scans. Without that, it should be rather straight-forward...

maweigert commented 5 years ago

Ok, integral images are now part of the develop branch :) Only works for 2D/3D images though (which might not change).

import numpy as np
from gputools.transforms import integral_image 
integral_image(np.array([[4, 8, 0], [9, 8, 8]], np.int32))

array([[ 4, 12, 12],
       [13, 29, 37]])

For large images/volumes, I see roughly a 7-10x speedup with device-host memory transfer included, and 30-40x without. So it was well worth it! :)

jni commented 5 years ago

Wow, speed! :rocket:

I had a look at the commits. Definitely gonna need a crash course on OpenCL kernels from you when you're here. =D I noticed a lot of isinstance to dispatch appropriately. Possibly these things are fixable using the __array_function__ protocol, but I'm not sure at the moment.

Anyway, this is awesome! Now we have three things to dispatch in skimage. Stretch goal: somehow thread the GPU arrays through skimage functions so that we only pay the transfer cost at the start and end of a pipeline.

jni commented 4 years ago

@maweigert how often do you merge develop back into master? We just have a master branch in scikit-image, which works pretty well for us.

maweigert commented 4 years ago

I just did! :)

jni commented 4 years ago

Cool! And :wave:! You owe me an email, ;) but in the meantime, just to let you know, we are going to have an undergrad work on the skimage-gputools interface, as well as maybe bringing over some kernels from CLIJ. Should we work on PRs here or should we work on my fork for a while? I get the sense that you don't have a lot of bandwidth for this lib these days...