scipy.ndimage.find_objects GPU implementation

kazemSafari commented 1 month ago

🚀 The feature

This function a basic building block of any biomedical or even manufacturing image analysis application. Given an image containing multiple objects, say ~1000-10,000 objects, it provides the bounding box of each object.

It gives a list of tuple of slices of coordinates of labelled objects/cells within a mask image of dtype Uint16 or Uint8, assuming the image background is 0, and the labelled objects go from 1, 2, ..., max_label.

I was wondering it is possible to implement it in torch C++ using a simple TensorIterator.

Basically the simplest case would be it takes a 2D tensor of size (H, W) as input and outputs a tensor of slices of size (N, 2) where N is the number of objects, and each row is [slice(start,end,step), slice(start,end,step)].

which uses Iterators defined here: https://github.com/scipy/scipy/blob/v1.10.1/scipy/ndimage/src/ni_support.h

cc @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki

Motivation, pitch

This function is the key that allows to embed traditional machine learning tools such as thresholding techniques (such as otsu, triangle, max_entropy, and etc), watershed segmentation, edge filters, dilation/erosion operations into pytorch. Traditional machine learning tools are very helpful and can still be used to achieve great image segmentation. They are backed by numpy and scipy libraries can analyze a single image at a time and can be parallelized across multiple cpus.

However, they are a lot slower compared to pytorch dataloader fetching data for GPU computing. Platforms such as Nvidia RAPIDS are a failure to achieve this. They are not properly maintained and have lots of issues. They work with Numba and are very tricky to use and install.

Alternatives

It is much faster and much more efficient than torchvision.ops.masks_to_boxes. The implementation in C numpy can be found here: https://github.com/scipy/scipy/blob/v1.10.1/scipy/ndimage/src/ni_measure.c

Additional context

Can it also be extended to allow extract objects from a tensor of dimension (B, C, W, H) where B is the batch size, C the number of channels and W is the width and H is the height.

NicolasHug commented 1 month ago

Thanks for the feature request @kazemSafari . Is there an example of a CUDA implementation of such a util already? For the moment I'm a bit concerned that the complexity of implementation and maintenance cost aren't proportionate with the existing demand for such a utility

kazemSafari commented 1 month ago

@NicolasHug Thanks for getting back to you. I can try to help build the cuda code myself with your help step by step. Also I can share my basic understanding of it if that helps. Also is there a way to invite others from community to invest some time on it?

It may not seem that beneficial at first but in many application such as remote sensing, biomedical imaging, and manufacturing it is the key. Right now, All AI development focus is heavily biased towards RGB images. However, tiff/uint16 images are also very important have a huge role.

Usually there are 1000 of objects that needs to be detected and segmented in a tiff images. Therefore, this function is the gateway that opens a lot of doors. It helps many applications to start benefitting from a proper GPU implementation.

pytorch / vision