spacetx / starfish

starfish: unified pipelines for image-based transcriptomics
https://spacetx-starfish.readthedocs.io/en/latest/
MIT License
228 stars 68 forks source link

local_max_peak_finder reports too many spots when pixels have identical max intensity #1826

Open mattcai opened 4 years ago

mattcai commented 4 years ago

Description

Running LocalMaxPeakFinder on an image where multiple pixels at the "peak" have the same intensity value (possibly due to saturation or clipping) returns SpotFindingResults where every pixel is counted as a separate spot. This occurs even with min_distance set to a large value.

This is due to the correct behavior of peak_local_max() which says: "If there are multiple local maxima with identical pixel intensities inside the region defined with min_distance, the coordinates of all such pixels are returned."

One possible fix is to use turn coordinates into ndimage array and use scipy.ndimage.label and skimage.measure.regionprops

Steps/Code to Reproduce

In in situ sequencing notebook after image registration run the following code:

lmp = FindSpots.LocalMaxPeakFinder(
    min_distance=6,
    stringency=0,
    min_obj_area=3,
    max_obj_area=600,
    is_volume=True
)
spots = lmp.run(image_stack=imgs, reference_image=dots)

%gui qt
from starfish import display
from starfish.core.spots.DecodeSpots.trace_builders import build_traces_sequential, build_spot_traces_exact_match
intensity_table = build_spot_traces_exact_match(spots)
viewer = display(stack=dots, spots=intensity_table)

Expected Results

One spot found in each spot

Actual Results

Multiple spots found in each spot. The found spots are often neighboring and when examining intensity values I found they have the same intensities.

ttung commented 4 years ago

Do we not want to expose num_peaks_per_label any more?

mattcai commented 4 years ago

My concern with using num_peaks_per_label is that makes thresholding and labeling the step that determines number of spots when I think the purpose of local_max_peak_finder is to use local max intensities to determine number and location of spots.

I think a better solution would be to use peak_local_max to find all coordinates as starfish currently does, and then apply connected component labeling again to merge neighboring coordinates into one.

ttung commented 4 years ago

num_peaks_per_label shouldn't affect the number of spots found. it will affect the number of pixels found.

mattcai commented 4 years ago

What I meant is using num_peaks_per_label will limit the number of spots found by local_max_peak_finder in an undesired way. Setting num_peaks_per_label is useful if every label corresponds to exactly one spot. But the way local_max_peak_finder is written, I think it assumes a label can have a variable number of spots in it, which can be identified by local max peaks within the label. This function works if every peak is a single pixel but over counts if the peak is "flat" on top.

Adding support for kwargs in local_max_peak_finder is good but doesn't solve this issue imo.