seung-lab / connected-components-3d

Connected components on discrete and continuous multilabel 3D & 2D images. Handles 26, 18, and 6 connected variants; periodic boundaries (4, 8, & 6)
GNU Lesser General Public License v3.0
353 stars 42 forks source link

Supporting V1 dtype #126

Open SergeyTsimfer opened 1 month ago

SergeyTsimfer commented 1 month ago

Hey,

Great work with the library!

Is it possible to add support for one-bit dtype (np.dtype('V1')) arrays as input for cc3d routines, mainly connected_components?

william-silversmith commented 1 month ago

Hi!

Thanks for the suggestion. I was thinking about it and I'm trying to decide if this is enough of a gain to be worth it. I presume you are trying to optimize memory consumption, but odds are the output will need to be uint16 or uint32 to contain the final labels, particularly if your input is so large as to justify bit-packing.

I also gave this a try:

>>> arr = np.zeros([32], dtype='V1')
>>> arr.nbytes
32

It doesn't seem like numpy actually represents things as bit-packed?

I would recommend doing arr.astype(bool) or arr.astype(np.uint8).

Can you tell me a little bit more about your use case? How big the input array is, how many components there are, what are the performance constraints etc.?

Will

SergeyTsimfer commented 1 month ago

My usecase is a ~5000x5000x2000 array with one very large component (a surface along the first two dimensions, so it has ~5000x5000x3 points) and a few thousands of smaller ones. This array comes as a prediction to a binary segmentation task, so I need only the largest component anyway.

I am already passing it as dtype=bool array and I know in advance that uint16 is enough for the final labels; this works well within my RAM requirements, so I'm mainly trying to optimize the speed of component extraction. Using a smaller dtype was one of the ideas; however, as you pointed out, for numpy the minimum size of a dtype is one byte:(

On the other hand, array can be manually packed into uint8, analogous to how np.packbits works. If connected_components would be aware of such possible packed dtype, would it work faster than on actual one-bit dtype?

william-silversmith commented 1 month ago

Interesting use case. Here are my thoughts:

1) Have you considered downsampling the image? You can achieve very fast downsamples with my https://github.com/seung-lab/tinybrain/ library. I expect you'd want to use "mode" downsampling.

2) Bit packing the image could make the algorithm slower or faster. If your image is so large as to be swapping or otherwise the caches are not managed well, the bit packed image could be a little faster for reading the input. However, if that's not the case, bit packed image access is slower as more work needs to be done to extract the proper bit per a voxel. The output image still needs to be written voxel by voxel and this won't help with that.

3) Binary images do have slightly faster algorithms available. The algorithm used here is for multivalued images. You can expect somewhere between a 10-20% performance improvement, maybe a bit more, by using these improved techniques.

4) For your use case in particular, where the number of shapes is limited and most of the voxels do not need to be analyzed, using an alternative algorithm such as contour tracing may show superior performance. However, this is not guaranteed.

5) I don't know what your time target is, you have a 50 Gigavoxel image and it will take some time to process. However, a parallel implementation of cc3d would potentially get you the kind of speedups you are looking for. However, it is not particularly easy to write.

william-silversmith commented 1 month ago

I did some experiments with parallel and found I got about a 2.3x improvement with 8 cores. I'm probably doing something wrong though.