seung-lab / connected-components-3d

Connected components on discrete and continuous multilabel 3D & 2D images. Handles 26, 18, and 6 connected variants; periodic boundaries (4, 8, & 6)
GNU Lesser General Public License v3.0
356 stars 42 forks source link

Applying Dust and largest_k dtype output option #100

Open doiko opened 2 years ago

doiko commented 2 years ago

Hi, I apply dust before largest_k; is this the right order? Or performance wise largest_k should be applied first? My input is a boolean array.

Do you consider casting of largest_k output to the relevant dtype based on k value? If k in largest_k is less than 65535 no need to for label_out to be uint32, uint16 will be sufficient. So for k<255 label_out can be uint8. Can this be considered to reduce memory requirements?

Dimitris

william-silversmith commented 2 years ago

Hi Dimitris,

Can you let me know how large your array is and how fast it is executing (and what you are expecting) how much memory it is using (and what you are expecting)? Unfortunately, even if the final array fits in 255, usually at least 10x the number of provisional labels are assigned during the calculation and so using uint8 is rarely possible except for fairly simple images.

Will

doiko commented 2 years ago

Hi Will, Indeed 256 is unlikely to be useful. But asking for the 3000 largest objects might be the case. My volumes, semantic segmentation binary results, might vary from as small as 1kx1kx1k voxels to 10k x 10k x 4k voxels with most common in the middle of these sizes. More important for me and possible others is to understand the maximum expected memory requirement of the algorithm for a given volume size.
Currently I am using an overlapping workaround to allow processing volumes that do not fit memory.
I would like to be able to use something like: z_chunk_size = min( int( max_memory_footprint / (z_size * (dest_cube.dtype.itemsize + source_cube.dtype.itemsize)) ), z_size, ) to estimate the z direction cube chunk size that will fit my memory availability defined in the parameter max_memory_footprint. Understanding what one has to put in the denominator (z_size * (dest_cube.dtype.itemsize + source_cube.dtype.itemsize)) will be a great help. Best, Dimitris