spacetx / starfish

starfish: unified pipelines for image-based transcriptomics
https://spacetx-starfish.readthedocs.io/en/latest/
MIT License
226 stars 68 forks source link

Refactoring spot finding #1450

Closed shanaxel42 closed 5 years ago

shanaxel42 commented 5 years ago

Design Proposal

The approach that seqFish takes to spot finding breaks some of our current processing modals. The existing spot finding ecosystem was built up pretty independently for each assay and makes reusability difficult. This proposal refactors our current spot finders into a few small packages that are hopefully modular enough to support seqFIsh and any other future spot finding approach.

starfish.spots would include the following 3 packages:

To support each step the following three data models are required

-Spot Attributes (a dataframe representing spots and their attributes x,y,z,radius, and optional info: passes threshold, target, intensity)

-IntensityTable (aka spot traces, a representation of spots, their attributes and their corresponding traces, for mulitplexed methods this trace is a vector. For non multiplexed methods the trace is a singe value. )

-DecodedIntensityTable(aka decdoded spot traces. a representation of spots, their attributes and their corresponding traces, for mulitplexed methods this trace is a vector. For non multiplexed methods the trace is a singe value AND their decoded target value)

The inputs and outputs of these packages would be:

This introduces the concept of a DecodedIntensityTable which would just be a subclass of IntensityTable that includes a 'targets' dimension. This is a minor change that I think will reduce a lot of confusion around the outputs of each step.

In order to incorporate methods that use local spot finding like SeqFish. We also need to add the concept of a "spot id" to the SpotAttributes Dataframe. This will allow us to use the same spot in multiple possible groupings and track it.

So what does this mean for current spotFinders:

ex. osmFISH osmFish currently uses a LocalMaxPeakFinder with no reference image. It's new workflow would be:

lmp = LocateSpots.LocalMaxPeakFinder(
    min_distance=6,
    stringency=0,
    min_obj_area=6,
    max_obj_area=600,
)
spot_locations = lmp.run(imagestack)

ms = MeasureSpots.MeasureSpotsAlgorithm()
intensityTable = ms.run(spot_locations, imagestack)

ex. ISS ISS currently uses a blob detector on a blobs images. It's new workflow would be:

p = LocateSpots.BlobDetector(
    min_sigma=1,
    max_sigma=10,
    num_sigma=30,
    threshold=0.01,
    measurement_type='mean',
)
spot_locations = p.run(image=dots)
m = MeasureSpots.MeasureSpotsAlgorithm(measurement_function=np.max, radius_is_gyration=False)
intensities = m.run(data_image=imagestack, spot_attributes=spot_locations)

ex. BaristaSeq baristaSeq currently uses PixelSpotDecoding. Pixel spot decoding will remain it's own module that does both locating and decoding.

ex. StarMap Starmap currently uses LocalSearchBlobDetector. It's new workflow would be:

spot_locater = LocateSpots.LocalSearchBlob()
spot_locations = spot_locater.run(imagestack)

m = MeasureSpots.MeasureSpotsAlgorithm(measurement_function=np.max, radius_is_gyration=False)
intensities = m.run(data_image=imagestack, spot_attributes=reduced_spot_locations)

smFish smFish currently used TrackpyLocalMaxPeakFinder with no reference image, it's new workflow would be:

spot_locater = LocateSpots.LocalSearchBlob()
spot_locations = spot_locater.run(imagestack)

m = MeasureSpots.MeasureSpotsAlgorithm(measurement_function=np.max, radius_is_gyration=False)
intensities = m.run(data_image=imagestack, spot_attributes=reduced_spot_locations)
ambrosejcarr commented 5 years ago

I think you're on to something here. :-)

I have some clarifying questions.


LocateSpots (find the locations of spots in an image)

is "an image" here an arbitrary series of 2d or 3d planes?


For the ISS example, should

p = DetectSpots.BlobDetector(

be p = LocateSpots.BlobDetector(?


A hiccup in GroupSpots is that in cases where we engage in fuzzy matching (e.g. LocalBlobDetector, the position of the matched spots won't be the same across rounds and channels, and therefore our existing IntensityTable approach won't be flexible enough to hold this output, and won't be a good substrate for MeasureSpots.


MeasureSpots (measure the intensity of spots at a list of locations)

Could you flesh out the contract more explicitly? What indices would be passed in? (x, y, z)? (x, y, z, r)? How would these be matched to the images that would be measured.

Also, MeasureSpots is interesting. There are lots of options here such as taking the max/average intensity of a gaussian blob, ellipse, or ellipsoid. Some spot detectors (like the trackpy one) estimate the area in which a fraction of the mass (brightness) of the spot lies. These methods could be tricky to disentangle from their implementations and rewrite in starfish.

It's also worth noting that the skimage.feature.peak_local_max is used in skimage.feature.blob_log; we could potentially simplify our code a bit if we're breaking measurement out of spot finding.


decoded_intensities = starfish.spots.Decode.Lookup()

could you explain why this is needed, and perhaps how decoding would work in the remainder of the spot finding examples?

ttung commented 5 years ago

My opinions (and therefore @shanaxel42 should chime in with her opinion).

is "an image" here an arbitrary series of 2d or 3d planes?

It should be a 5D tensor, with axes that we group by.

A hiccup in GroupSpots is that in cases where we engage in fuzzy matching (e.g. LocalBlobDetector, the position of the matched spots won't be the same across rounds and channels, and therefore our existing IntensityTable approach won't be flexible enough to hold this output, and won't be a good substrate for MeasureSpots.

If we group a bunch of spots with pixel coordinates that are not identical, would we need the coordinates of each spot in each r·c? Would the average location not suffice? MeasureSpots could have an implementation that tolerates some fuzz around where it's doing the measurement.

MeasureSpots (measure the intensity of spots at a list of locations)

I think this needs an "axes to measure across". Sequential assays should be doing this across C or Z (not mutually exclusive).

Some spot detectors (like the trackpy one) estimate the area in which a fraction of the mass (brightness) of the spot lies. These methods could be tricky to disentangle from their implementations and rewrite in starfish.

are you saying these do more than one thing?

ambrosejcarr commented 5 years ago

is "an image" here an arbitrary series of 2d or 3d planes?

It should be a 5D tensor, with axes that we group by.

Spot finders really only work on 2d and 3d images, though -- grouping over ([r|c], z, y, x) would likely produce weird discontinuities in the (r, c) dimensions since we don't expect coherent spot locations across rounds or channels. So, refining this, it should be a 5d tensor, and accept grouping over (y, x) or (z, y, x), similar to the is_volume flag we have now?

If we group a bunch of spots with pixel coordinates that are not identical, would we need the coordinates of each spot in each r·c?

I think we would, because...

Would the average location not suffice? MeasureSpots could have an implementation that tolerates some fuzz around where it's doing the measurement.

This would decrease precision of the outcome by averaging over regions with signal and no signal, or in cases with crowded signal, produce the wrong result.

MeasureSpots (measure the intensity of spots at a list of locations)

I think this needs an "axes to measure across". Sequential assays should be doing this across C or Z (not mutually exclusive).

👍

Some spot detectors (like the trackpy one) estimate the area in which a fraction of the mass (brightness) of the spot lies. These methods could be tricky to disentangle from their implementations and rewrite in starfish.

are you saying these do more than one thing?

I was trying to articulate that I'm not confident that simply reporting (z, y, x, r, c, radius-or-equivalent) is always adequate to determine the spot intensity. I suspect we can do this, but as I understand the proposal, we'll need to dig into blob_log and trackpy.locate to understand what they're doing to measure intensity and factor that out.

shanaxel42 commented 5 years ago

Spot finders really only work on 2d and 3d images, though -- grouping over ([r|c], z, y, x) would likely produce weird discontinuities in the (r, c) dimensions since we don't expect coherent spot locations across rounds or channels. So, refining this, it should be a 5d tensor, and accept grouping over (y, x) or (z, y, x), similar to the is_volume flag we have now?

Yes SpotLocate will always accept a 5d tensor and will work the same as the is_volume does now. So it will either apply the function to 2d or 3d tiles.

Some spot detectors (like the trackpy one) estimate the area in which a fraction of the mass (brightness) of the spot lies. These methods could be tricky to disentangle from their implementations and rewrite in starfish.

trackpy currently creates SpotAttribute with 'y', 'x', 'total_intensity', 'radius', 'eccentricity', 'intensity', 'raw_mass', 'ep' then uses the regular 'measure_spots` method the rest of the algorithms do so I see this fitting pretty nicely into the rewrite. I'm fine with adding more attributes to SpotAttribtues as long as we maintain that it's the output of LocatSpots and that MeasureSpots is a separate step.

Could you flesh out the contract more explicitly? What indices would be passed in? (x, y, z)? (x, y, z, r)? How would these be matched to the images that would be measured.

Depends on the SpotLocate method used, as above. But always a SpotAttributes dataframe with at least x,y positions.

decoded_intensities = starfish.spots.Decode.Lookup()

could you explain why this is needed, and perhaps how decoding would work in the remainder of the spot finding examples?

Since the output of LocateSpots will always be a SpotAttibutes dataframe, a simple step is left in the Pixel scenario to turn that into a DecodedIntensityTable, Decode.Lookup() would just do that last bit of mapping the key values of each spot to it's corresponding gene value. For the rest of the spot finding examples the last output is a regular IntensityTable, so the last step is just normal decoding to go from IntensityTable -> DecodedIntensityTable. The same as things work now.

For the ISS example, should

p = DetectSpots.BlobDetector( be p = LocateSpots.BlobDetector(?

yuuup my bad.

ttung commented 5 years ago

I strongly suspect that a table with a feature_id column and a spot_id column might work well as a data structure that accommodates all our decoding methodologies. It may complicate how users configure the spot detection methods, but might make it possible to have one data structure for everything.

shanaxel42 commented 5 years ago

Closing in favor of new direction detailed here: #1500