fast multilabel dilation

MariusCausemann commented 11 months ago

Hi, I'm developing a pipeline to generate meshes suitable for finite element simulations from segmented electron microscopy images of brain cells. I found your software (cloudvolume, fastremap, ...) very useful for this, thank you! However, one missing piece for me is a fast method for morphological operations on multilabel 3D images. To keep the number of points in my meshes small, I need to smoothen and simplify the cellular geometries and use a sequence of morphological erosion, dilation, opening, closing, etc for this. So far, it's based on scipy.ndimage / dask.image, but this only supports binary images, requiring looping over all labels in the dataset.
Is there a way to add multilabel support for fast multilabel dilation to this package? Or do you have a pointer to some other package for me? Thanks!

Marius

Small illustration a the resulting FEM mesh, intended for e.g. electrophysiology, fluid flow, cell swelling simulations:

william-silversmith commented 11 months ago

Hi Marius,

Thanks for writing in, I'm glad you're finding these tools useful!

The method I'm currently using is not suitable for multi-label images. However, it does seem like it would be possible to devise one. In your opinion, would it be useful to define the multilabel operator as:

a 3x3x3 structuring element (all "on")
that fills in the background color (0) only
that uses the mode color in cases where there is a conflict

Will

MariusCausemann commented 11 months ago

Hi Will, thanks for your quick reply! Currently, I mainly work with option 1 (all "on", labels can dilate into other labels) and sometimes option 2 (labels only dilate in background voxels). It would be optimal if one could choose between all those options, but I would already be happy with either of them. Do they differ a lot in their complexity, both for implementation effort and runtime?

Marius

william-silversmith commented 11 months ago

Hi Marius,

I think I was a bit unclear. The list was not options, but a description of the whole system (so all three properties at once). It would be possible to make certain properties configurable.

What does dilation look like to you in the case where two labels are touching?

Will

MariusCausemann commented 11 months ago

Hi Will, I think two options would be reasonable for touching labels: a) don't dilate them at their interface and b) dilate only one (and consequently erode the other). Do you think this could be configurable?

william-silversmith commented 11 months ago

So here's the good news, I have an untested prototype of (a) here. https://github.com/seung-lab/fastmorph/pull/2

For (b), there would have to be some arbitrary criterion as to which shape got eroded by the other, such the max label id wins. Would that be acceptable?

MariusCausemann commented 11 months ago

Great! I'll have a look at the prototype and test it out soon. The max label id winning is totally acceptable -as you say, it's arbitrary anyway. Thanks a lot for your effort!

MariusCausemann commented 11 months ago

After thinking about it a bit, it might make sense to pass in a list of label ids? The order of the ids in the list could then determine which label gets eroded and which dilated. Ideally, that would also allow dilating only a subset of the labels.

william-silversmith commented 11 months ago

Interesting, but what happens when you pick two labels that are adjacent? Then you run into the same problem.

One issue with this approach is that the most obvious implementation requires a set or map with potentially frequent access. That would be a big performance hit (though not as big as having to process each individual shape as you have been doing).

william-silversmith commented 11 months ago

I just realized there's a way to make those concepts work together. Make the base algorithm work based on smaller (or larger) numbers dominating, but accept a list in python and remap the volume in list order.

william-silversmith commented 11 months ago

Nevermind, there's actually a much simpler solution that has a very simple implementation. I just pick the stencil mode at each point.

Speed trial on a 512^3 volume with >2000 labels.

Background only: 1.0 sec Allowing conflicts: 49.4 sec

Is that a reasonable way to go about things?

william-silversmith commented 11 months ago

Check out the latest code in #2 there are dilation, erosion, closing, and opening operators that are reasonably efficient.

MariusCausemann commented 11 months ago

Great, this looks pretty good! I did a quick comparison with the corresponding scikit-image functions on a 500x500x500 image (small chunk of the cortical mm3 dataset, around 500 labels):

in background_only=False mode:

import skimage.morphology as skim
In [50]: %timeit skim.dilation(img, skim.cube(3))
1.67 s ± 31.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [51]: %timeit fastmorph.dilate(img, background_only=False)
3.24 s ± 41.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

in background_only=True mode:

from skimage.segmentation import expand_labels
%timeit expand_labels(img)
23.1 s ± 2.24 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit fastmorph.dilate(img, background_only=True)
2.7 s ± 64.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So the background_only mode is really fast, the background_only=False is somewhat slower. My results look quite different from yours though, do you think this is due to image characteristics?

william-silversmith commented 11 months ago

So I haven't extensively tested the code yet, so there could be bugs. However, there are a couple things that stand out to me.

skim.dilation seems to take the maximum of the neighborhood, while I'm taking the mode. Maximum is a simpler operation to execute so it doesn't surprise me it's a bit faster though there are likely opportunities for further optimization in my code.
expand_labels has an interesting implementation. It's pretty similar to my implementation of spherical_dilate but scipy also can compute feature maps for its distance transform and it uses that to do the multi-label approach. Probably computing both the EDT and the feature map adds some time.

Can you share some results you find surprising? Also, what is your performance target?

MariusCausemann commented 11 months ago

Hi Will, I wasn't aware of the difference with mode vs max between skim.dilation and fastmorph.dilate, it makes sense then! I was surprised by the fact that in my test, fastmorph.dilate is roughly equally fast with background_only=True and False, while you mention a 50x difference in runtime. Any idea why?

Regarding performance: It would be great if one could process images of up to 5k ^3 voxels in a reasonable time. My previous dask-image-based implementation took a few hours for about 1k^3 voxel images (could probably have been optimized further, but I thought it would be worthwhile to look for alternatives). I haven't tested fastmorph large images yet, but if it scales with O(n), a single dilation on a 5k^3 image should take around 1h. Given that the processing consists of a sequence of quite a few erosion/dilation operations, this would still take quite some time.

william-silversmith commented 11 months ago

I've been experimenting with the erosion operator which is a bit simpler than dilation since the mode doesn't need to be computed.

The issue in getting additional single-core speedup has to do with the memory access pattern which cuts across the depth dimension regardless of which orientation (C or F) you use. There are some tricks like doing blocks of the image to ensure L1/L2 cache friendliness, but I've never personally had much success with them. I could give it a try.

Since you need to process very large images, I think the answer would unfortunately be multi-threading. Fortunately, these algorithms are pretty trivially parallel so that's something that could be considered.

I'll need to do more experiments to understand why you are seeing background_only=False getting such good performance. In that mode, the stencil needs to retrieve 10 voxels of data at every voxel. In True mode, only background voxels need to be considered, so a large amount of work can be skipped in densely labeled data.

william-silversmith commented 11 months ago

I gave the block approach a try and didn't see a big benefit (small penalty). I suspect the way forward is multithreading. I have a prototype that is mostly working for dilate.

william-silversmith commented 11 months ago

Ok, I uploaded threaded dilation.

william-silversmith commented 11 months ago

Just fixed a bug. I think with multi-threading, at least for a 5000^3 volume, you should be down to under an hour per run. Still could use some work, but hopefully that puts you in an acceptable place. I'm trying to fix the build systems for py312 but hopefully I can make a release soon.

william-silversmith commented 11 months ago

I also found some big speedups for dilate background_only=False though it's possible your compiler found some of them before I did.

william-silversmith commented 11 months ago

Version 1 has been released on PyPI! Let me know how it works out for you.

william-silversmith commented 11 months ago

On my standard test volume, I tried the following:

s = time.time()
res = fastmorph.dilate(labels, parallel=8, background_only=False)
print("dilate / background_only=False / 8 threads: ", time.time() - s)

import skimage.segmentation

s = time.time()
res = skimage.segmentation.expand_labels(labels)
print("skimage dilate / 1 thread: ", time.time() - s)

The results were:

dilate / background_only=False / 1 thread:  14.205087900161743
dilate / background_only=False / 2 threads:  7.286011219024658
dilate / background_only=False / 4 threads:  4.977150917053223
dilate / background_only=False / 8 threads:  4.070333003997803
skimage dilate / 1 thread:  68.5935742855072

Memory usage looked like so:

At about 5 seconds, you can see fastmorph.dilate running. Around 20 seconds you can see scikit-image. The memory usage is much higher as well in skimage.

william-silversmith commented 5 months ago

I'm gonna close this since it's been implemented for a while. I'm finding it useful for my own purposes!

seung-lab / fastmorph

fast multilabel dilation #1