spacetx / starfish

starfish: unified pipelines for image-based transcriptomics
https://spacetx-starfish.readthedocs.io/en/latest/
MIT License
221 stars 67 forks source link

MinDistanceLabel does not enforce minimum distance, leading to over-segmentation #2006

Closed iimog closed 5 months ago

iimog commented 8 months ago

Description

The minimum distance passed to starfish.morphology.Filter.MinDistanceLabel is not enforced. In particular, if multiple local maxima exist with identical values, all of them are retained, independent of their distance. This leads to surprising cases of over-segmentation, where perfectly contiguous masks are split into multiple components (see example below).

Steps/Code to Reproduce

Using this exemplary mask (saved as mask.png): mask

import numpy as np
import skimage as ski
import matplotlib.pyplot as plt
from starfish import ImageStack
from starfish.morphology import Binarize, Filter

# Load the image, transform it to a BinaryMaskCollection and apply the MinDistanceLabel
img = ImageStack.from_numpy(np.expand_dims(ski.util.img_as_float32(ski.io.imread("mask.png")), (0,1,2)))
binarized = Binarize.ThresholdBinarize(.5).run(img)
masks = Filter.MinDistanceLabel(120, 120).run(binarized)

print("Nuclei found:", len(list(masks.masks())))
plt.imshow(masks.to_label_image().xarray.squeeze(),interpolation="nearest")

Expected Results

Two clearly separated nuclei. properseg

Actual Results

Four detected nuclei, the top one separated into three distinct areas (the small line between the top and bottom has its own class). overseg

Cause and possible solution

The cause of this problem is, that the internally called skimage.filter.peak_local_max (https://github.com/spacetx/starfish/blob/853f56c7c02b15397adb921db5e3bde02fdadb63/starfish/core/morphology/Filter/min_distance_label.py#L61-L66) is not passed an appropriate value for min_distance, so the default value of 1 is used. Setting the footprint according to the minimum_distance ensures, that there can be only one maximum value in the respective area, but if that value occurs multiple times, all occurrences are returned as local maxima and thus used as seeds in a watershed. This is certainly undesirable as it leads to cases of over-segmentation, as exemplified. As I'm currently working with 2D data only, changing the starfish code above to pass min_distance=self._minimum_distance_xy fixes this problem for me. However, for 3D data, a more sophisticated solution might be desirable.

Versions

Linux-5.15.0-84-generic-x86_64-with-glibc2.35 Python 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:39:03) [GCC 11.3.0] NumPy 1.26.0 SciPy 1.11.3 scikit-image 0.18.3 pandas 1.3.2 sklearn 0.24.2 xarray 0.19.0 sympy 1.5.1 starfish 0.2.2+41.g0e668d12

berl commented 8 months ago

@iimog this looks like a good candidate for a bugfix PR! Please make one with your change and reference this issue. I agree a 3D version may require more work, but it would be great to solve this problem first anyway.