spacetx / starfish

starfish: unified pipelines for image-based transcriptomics
https://spacetx-starfish.readthedocs.io/en/latest/
MIT License
226 stars 68 forks source link

`IntensityTable` is truncating x,y,z coordinates for spot detection methods that return subpixel results #1310

Closed njmei closed 5 years ago

njmei commented 5 years ago

We've been in the process of comparing the results of our current pipeline vs. a starfish version and it looks like we've stumbled on a really unfortunate 'feature' of the current IntensityTable data structure for spot based detection methods.

Below is a visual example (diamonds are our pipeline, circles are starfish), with the takeaway being that the starfish circles are not centered on detected spots: Screenshot from 2019-05-02 14-44-25

Both our spot finder and the Starfish equivalent are based on the Crocker-Grier centroid finding algorithm and can determine the coordinates of a spot with sub-pixel resolution.

So I ended up chasing a red herring before discovering that IntensityTable x, y, z properties are int64, which would floor() any result returned by the Crocker-Grier algorithm. In fact, this flooring appears to happen immediately after results are produced.

It looks like (xc, yc, zc) are then real world coordinates that map to each set of pixel indices (x, y, z). I think that this design is probably (?) fine for pixel based decoding methods, but throwing out sub-pixel information for spot methods is a big deal breaker.

CC: @berl @ambrosejcarr

dganguli commented 5 years ago

@njmei pardon my ignorance here -- can this be fixed simply by removing the flooring you pointed out immediately after the results are produced?

if that's true, does that break anything else? if not, then i think the fix is simple?

but i'm sure i'm missing something here

ambrosejcarr commented 5 years ago

Seems like we'll also need to make the IntensityTable pixel coordinates Floats instead of Ints. This feels like a sensible and relatively simple change to me.

njmei commented 5 years ago

@dganguli I'm hoping resolving this is as simple as removing the flooring :)

But there was probably (?) a reason the spot values were converted to int which is why I opened this issue. I'm wondering if there are any specific indexing operations that rely on the spot locations being ints?

dganguli commented 5 years ago

@njmei got it. some basic sleuthing can elucidate why this decision was made by following PR by @shanaxel42 https://github.com/spacetx/starfish/pull/898

which resolves to an issue made by @berl https://github.com/spacetx/starfish/issues/825

my sense is that, for spot detection algorithms that return sub-pixel results, x, y, and z should be stored as floats.

however, if the user tries to index into an intensity table in a manner that assumes these x, y, and z values are ints -- that will cause things to blow up. it's not clear to me what that indexing use-case is though? does anyone remember?

dganguli commented 5 years ago

the blob detector has the same problem: https://github.com/spacetx/starfish/blob/eb7c43842c0ffda2f24136d7a3c625dbdd2fcacb/starfish/core/spots/_detect_spots/blob.py#L117

dganguli commented 5 years ago

uh oh, re-visting @berl 's stack trace in detail ... i think the indexing use case is internal to starfish, something to do with calculating physical coordinates assumes pixel coordinates are ints. the plot thickens!

njmei commented 5 years ago

Do physical coordinate calculations not use interpolation?

EDIT: Looks like no

And if interpolation were to be used to get the correct xc, yc, zc for subpixel x, y, z values, would further downstream steps like combining multiple fov spot IntensityTables be able to deal with this?

dganguli commented 5 years ago

If physical coordinate calculations were to use interpolation, I think that would fix this particular problem.

I'm not sure I understand your question about further downstream steps though?

njmei commented 5 years ago

@dganguli so let's assume physical coordinates (xc, yc, zc) are now calculated with interpolation for IntensityTable

My question about downstream steps is: Will starfish run into troubles when trying to merge spot IntensityTables across different fovs? Specifically, is it the case that spot table merging across fovs is not dependent on x, y, z being int?

njmei commented 5 years ago

I've put in a pull request that kicks the can down the road but unblocks and allows me to continue pipeline spot comparisons. There should probably be a separate issue opened for dealing with downstream effects of having float x,y,z spot attributes in the IntensityTable?

Alternately, I can make the changes on a personal branch and wait for a more holistic solution?