spacetx / starfish

starfish: unified pipelines for image-based transcriptomics
https://spacetx-starfish.readthedocs.io/en/latest/
MIT License
228 stars 68 forks source link

ENH: Update LocalMaxPeakFinder to generate IntensityTable #305

Closed ambrosejcarr closed 6 years ago

ambrosejcarr commented 6 years ago

Currently it generates SpotAttributes and EncoderTable. Update this method.

ambrosejcarr commented 6 years ago

@berl I've been thinking about what it will take to update LocalMaxPeakFinder to use the Codebook and IntensityTable objects and generalize across other assays in the starfish codebase.

You had previously mentioned that it was important to do per-channel detection, thus ignoring the intensities of other channels at the same coordinate and protecting against crowding. I wanted to confirm that the above request is still important for you. If that's the case, and I were designing this module for smFISH, the best way to support this would be to concatenate the features found in each channel into one long IntensityTable.

However, this does not generalize well to experiments with multiple hybridization rounds, or for experiments that want to compare across channels. If I were designing this for generality, I would max project over channels and hybridization rounds to get all the spots, then determine their intensities across all the 2d image tiles.

In the "generalized" approach above, we could still easily do per-channel decoding by masking out other channels. However, there is a risk of creating hybrid spots across channels (e.g. where two spots overlap imperfectly, presumably an example of local crowding). In these cases the centroid, eccentricity, and radius of the spot would all be measured incorrectly, which could lead to improper filtering.

Perhaps worth mentioning:skimage upgraded blob_log earlier this year to support 3d spot detection, so that could also be used to do 3d detection. It might be worth investigating the extent to which these two approaches are interchangeable.

Tagging @dganguli for his perspective.

berl commented 6 years ago

You had previously mentioned that it was important to do per-channel detection, thus ignoring the intensities of other channels at the same coordinate and protecting against crowding. I wanted to confirm that the above request is still important for you

definitely still important for me and probably also important for Simone and any other non-barcoded assay. also, even the barcoded folks (MERFISH, SeqFISH) as published have used non-barcoded and barcoded assays in combination to span a larger dynamic range of expression.

If I were designing this for generality, I would max project over channels and hybridization rounds to get all the spots, then determine their intensities across all the 2d image tiles.

this will work for extremely sparse data, like what's shown in the recent CODEX preprint. But it wouldn't work for multiround smFISH, at least not with many of the markers we use routinely. So what's the solution?

If that's the case, and I were designing this module for smFISH, the best way to support this would be to concatenate the features found in each channel into one long IntensityTable. However, this does not generalize well to experiments with multiple hybridization rounds, or for experiments that want to compare across channels.

I think it works fine for both of those things if you concatenate features from each (r,c) tuple. For non-barcoded methods, the (r,c) tuple (which is decoded to a single target) is the natural unit for analysis at this stage and IntensityTable data should get built out of those. If you want to compare across channels, you do your analysis on the IntensityTable, aggregating or comparing over r or c as needed. I can use this type of approach for identifying lipofuscin as spots that show up within 1 pixel in all channels in 1 round, for example.

@ambrosejcarr does this make sense? I feel like you may have made this more complicated than it needs to be, probably because of all the complexity required for the barcoded methods to work. Also let's get on a call with @dganguli if there is some underlying logic that makes it difficult for the IntensityTable to be built this way.

berl commented 6 years ago

one more thing- at least some of the barcoded methods (e.g. SeqFISH) also create their IntensityTable analogs as I've described and then decode, so that's another reason to make sure we have this functionality

ambrosejcarr commented 6 years ago

Got it. Thanks very much for the detailed explanation. I'll think about how to execute this and make sure that the other approaches can build the IntensityTable object in the same way. I don't think there is any barrier, I just wanted to make sure it's important before I put three time in to creating the logic.

Chances are it will be as simple as adding logic to the IntensityTable constructors to build from a max projection or from independent spots. I'll be able to Port the existing max projection method to the PeakLocalMax method, and should be able to Port the (r, c) tuple method the other way.

Really helpful comments. :+1:

berl commented 6 years ago

I really like the idea of having the IntensityTable constructor able to build from independent spots. This capability will also be very useful for generating synthetic data for benchmarking decoding algorithms and probably even for unit testing.

dganguli commented 6 years ago

@ambrosejcarr I completely agree with @berl -- the ability to generate an intensity table that detects/quantifies spots independently across color channels is critical. Furthermore, as he points out, even the barcoded methods can require this functionality. Indeed, SEQ-FISH provides a very illuminating example of this. Regardless, I think it should be straightforward to build an IntensityTable that supports both barcoded and non-barcoded methods in an intuitive and simple way. Let's find some time to go through the SEQ-FISH use case, and sketch out a unifying IntensityTable API that can satisfy multiplexed, semi-multiplexed, and non-multiplexed use cases.

ambrosejcarr commented 6 years ago

Let's find some time to go through the SEQ-FISH use case, and sketch out a unifying IntensityTable API that can satisfy multiplexed, semi-multiplexed, and non-multiplexed use cases.

Excellent, lets schedule some time while I'm in town. I don't think it will be that complicated, I just wanted to make sure that the complexity was necessary. It'll be good for me to better understand the seq-fish use case.

... as you can maybe tell, I'm used to lossy data where you really can't trust an observation in a single channel, but I'm slowly ingraining that observations in FISH are more trustworthy 👍

ambrosejcarr commented 6 years ago

Closed by #357