spacetx / starfish

starfish: unified pipelines for image-based transcriptomics
https://spacetx-starfish.readthedocs.io/en/latest/
MIT License
228 stars 68 forks source link

A model for registration across assays #1051

Open ambrosejcarr opened 5 years ago

ambrosejcarr commented 5 years ago

Image Alignment

Image alignment in starfish is driven by the need to carry out two tasks:

  1. Identify the cell of origin for each spot
  2. Identify mRNA molecules whose identity is encoded across a set of images by matching the same spot across the images.

These problems are trivial in cases where there is no movement in the tissue or microscope stage. However, when the images shift between capture, a computational solution must be applied to match the positions of cells and spots across images. In cases where fluorescent spots are very close together, the latter problem can be very challenging to solve.

Absent a solution for these problems, starfish cannot be used by groups whose pipelines do not immediately register and apply the learned transformation to their images. Over half the SpaceTx groups take different approaches, and so enhancing our support for registration will improve starfish's value.

Dependencies

Definitions

Registration: is the process of learning a transformation that maps different sets of data into one coordinate system. In starfish, it is the process of accounting for subtle shifts in the imaging apparatus between image captures. Registration does not refer to applying the transformation.

Stitching: is the process of learning a transformation to combine multiple images with overlapping fields of view to produce a segmented panorama or high-resolution image. Stitching does not refer to applying the transformation.

Transformation: is the process of applying a learned transformation to an image, often to register or stitch it.

Code: A code is a series of fluorescent colors detected over multiple rounds of imaging that results from hybridizing a set of fluorophores to an mRNA molecule in a pattern designed to specifically identify it.

Identify the cell of origin for each spot

This problem is relatively simple, as it involves placing a series of <1um spots inside a cell of 10-30um diameter. In addition, in most cases currently being studied, cells are not directly adjacent, so a small dilation of the cell's area can offset small registration errors. As a result, this problem is typically adequately solved by matching cells across images. The main reasons cell assignment is worth mentioning are:

  1. Some assays detect each molecule in a single image, and for those assays cell alignment is the most complex registration problem that must be solved.
  2. Assignment of spots to cells needs to be carried out in coordinate space, and it currently is not.

Match spots across images

Image registration

Spots must be matched across images to build codes to identify mRNA. All coded assays we have examined while building starfish have some amount of movement of the images between rounds. Different types of movement require different analyses to solve. Translation, scaling, and rotation are linear shifts and can all be solved in the Fourier domain of an image, and can resample an image to generate new pixel intensities ("sub-pixel registration").

Spot registration

Shearing an image, however, requires that a set of coordinates be extracted from the source and destination images, and matching those coordinates provides the solution for shearing. Typical approaches to solve these problems use corner detection, which in our data can be replaced by spot detection using fiduciary beads, anchor probes (ISS), or nuclei (rough registration). The spots must be present in all images, but the approach can be made robust to drop-out of a small fraction of spots using a RANSAC algorithm.

The resulting solution to the affine transformation is a coefficient matrix which can be applied either to the image or to the coordinate space. The latter is more efficient to compute, but means that the images will not be aligned, and therefore vectorization or volumetric approaches cannot be applied across unregistered areas, so there is an optimization trade-off.

Matching spots

Even in registered cases, small aberrations in spot position mean that it is often not feasible to find spots in one image and measure intensity in that location in each of the others. Instead, many groups find spots in each round and match them using a local search. In cases where the data are crowded, this local search can be complex, and in the SeqFISH case, the search must be seeded from each round, to identify consensus codes and reduce false positives. Other approaches choose instead to decode pixels, and in order to ensure alignment, blur the images before decoding to spread signal in a local area. However, this has the drawback that it reduces signal:noise.

Applying registration transforms

A learned affine transformation can be applied to either an image or to points of a coordinate grid. Registering spot locations is simple, as it requires only transforming a single coordinate. Registering regions of interest stored as polygons is similar, and involves only transforming the vertices. Registering regions of interest stored as masks is likely more complex and additional research is required to solve this problem.

Spot alignment approaches used in SpaceTx

In-situ sequencing v0

  1. Translational registration and re-sampling of images across imaging rounds. No channel correction, images do not have z-depth
  2. Detect spots in a single channel and measure across channels.

In-situ sequencing v1

  1. Select an Anchor round, solve least squares equation of an array of tiles across a field of overlapping images to achieve a global coordinate grid. Store the transformation but do not apply it to the images
  2. Register the set of color channels and rounds of imaging within each field of view. Save the transformation but do not apply it
  3. Find spots in every tile/volume
  4. Apply the global and local transformations to the spots
  5. Match spots across channels

MERFISH

  1. Stitching across fields of view: unknown
  2. Translational registration and re-sampling of images across imaging rounds. Unknown channel corrections. Images we have do not have z-depth, but current iterations do. Expect they do 3-d registration to accomplish this.
  3. Blur images using a 1-sigma Gaussian to spread information locally.
  4. Decode each pixel, connect adjacent pixels that decode into the same entity into ROIs

MExFISH

  1. Stitching across fields of view: unknown
  2. Pre-alignment using phase correlation to carry out rough translation (applied to data)
  3. Thin-plate spline registration method to account for expansion deformation (applied to data)

BaristaSeq

  1. Stitching across fields of view: unknown, but probably resliced (verify?)
  2. Select anchor channel and carry out translational registration across rounds. Apply transform to images.
  3. Local iterative point cloud (earlier) or similarity transforms (later) in 300x300 pixel spaces to match spots. Transformation not applied back to spots

osmFISH

  1. Stitching: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2682522/pdf/btp184.pdf In brief, phase correlation via FFT is calculated between each overlapping set of nuclei and translations for each image are identified
  2. Data are then registered via the nuclei channel of the stitched image (rough registration only).
  3. Not clear if transformations are applied or delayed (should not matter)

3d smFISH (Allen)

  1. Select an Anchor round, solve least squares equation of an array of tiles across a field of overlapping images to achieve a global coordinate grid. Store the transformation but do not apply it to the images
  2. In each field of view, select an anchor channel and solve a translational registration. Do not apply the translation to the images
  3. Find spots and apply the transformation to the spots

DartFISH

  1. Stitching: SimpleElastix in Python (https://simpleelastix.github.io).
  2. We use the max intensity projection of DIC image stack acquired during every round of confocal imaging to calculate the affine transformation of each round to the middle round (eg use round 3 of 6 as reference image).
  3. Then apply the transformation to all fluorescence channels.

BioHub

  1. Exploring smFISH and ISS, with similar registration needs to the parent methods.

SeqFISH

  1. Stitching across fields of view: unknown
  2. Solve translational registration within fields of view and reslice
  3. Find spots in each volume
  4. Apply local matching approaches starting from each round, identify consensus codes and extract them

Summary of strategies

Unified approach for Starfish:

The following model would support most of the above variations:

  1. Solve stitching across images across tiles in an anchor round, using least squares (this could be improved upon by solving the equation in multiple rounds)
  2. Learn similarity sub-pixel transforms in image space and store the transforms.
  3. [Bonus] learn sub-pixel Affine transformations from positions of fiduciary bead
  4. Apply Affine transformations to images
  5. Apply Affine transformations to spots
  6. [Visualization] apply Affine transformations over field of view borders
  7. Methods to reconcile spots in overlaps. Solution may share properties with RANSAC approach for finding Affine transformations in (3).

Implications for starfish object model

  1. IntensityTable was built for aligned spots that were densely measured (ISS v0, MERFISH). It is not an efficient model for the storage of spots that are sparsely identified across channels, or that are identified on tiles that are not aligned.
    1. If spots are not aligned, the IntensityTable does not make sense because each spot in the table carries the same (x, y, z) coordinate
    2. If spots are sparsely measured, the IntensityTable does not make sense because there are many empty nodes and it does not currently support sparse arrays
    3. The IntensityTable does make sense as a post-registration object that carries spots with (x, y, z) coordinates translated into physical space if a sparse array backing for xarray is identified. Dask sparse could potentially work for this.
  2. There is need for an unregistered spots object. This could be a 2-dimensional xarray, but would need to be a structured array that supports mixed dtypes
    1. registration would operate on this unregistered spots object to produce an IntensityTable.
    2. local matching would operate on this unregistered spots object to produce an IntensityTable
  3. If no sparse array support is identified, it may make more sense to enable users to be more flexible about constructing IntensityTables For example, for one-hot encodings with sparse measurement, users could construct linear traces with one entry per round.

Investigations:

  1. Sparsity is not currently supported in xarray (pydata.sparse) but the operations are. We might be able to patch things to make this work.
    1. Collection of sparse discussion:
      1. https://github.com/scipy/scipy/issues/8162
      2. https://github.com/dask/dask-ml/issues/123
      3. https://github.com/pydata/xarray/issues/1375
      4. https://sparse.pydata.org/en/latest/roadmap.html
      5. Maybe a work-around by wrapping sparse in dask? https://examples.dask.org/xarray.html [No -- xarray doesn't fail, but nothing works after you create the xarray]
    2. Structured arrays aren't well supported either:
      1. https://github.com/pydata/xarray/issues/1626

Assumptions

  1. Registration schemes for coded and non-coded assays must both be supported
  2. Registration across or within tiles will not be solved technologically on a an adequately fast time scale to support our work.
  3. Need for non-rigid transformations are adequately limited that supporting them does not make sense at this time, however methods should be built that could potentially support learned transformations if they are determined to be important at a later time.

Risks

  1. Expansion Microscopy may become more prevalent as the field attempts to increase the number of targets that can be co-detected in the same cells, and these methods may require non-linear transformations which this proposal does not support.
  2. Deep learning approaches may include these types of approaches in their model.
  3. Possible that BaristaSeq is not supported by these approaches.

Implementation Requirements

  1. [Easy] Representation of learned transformations in memory and on disk [All]
    1. See: http://scikit-image.org/docs/dev/api/skimage.transform.html#warp
  2. Pipeline Component to learn Similarity transformations [All]
    1. For translation only, see: http://scikit-image.org/docs/dev/api/skimage.feature.html#skimage.feature.register_translation
    2. For Similarity transforms, lacking the ability to separate registration and transformation, see: https://github.com/matejak/imreg_dft
    3. Worth investigating: https://simpleelastix.github.io/ may provide all the transforms we need. I got an OSX compile error that I wasn't able to debug.
  3. [Easy] A pipeline component to apply Affine transformations to images (MERFISH, DartFISH, BaristaSeq, osmFISH, SeqFISH)
    1. See http://scikit-image.org/docs/dev/api/skimage.transform.html#warp
      1. N.B. this will provide a solution to #409
  4. A pipeline component to apply Affine transformations to point locations (ISS, 3d smFISH)
  5. [Blocked, segmentation format] A pipeline component to apply Affine transformations to ROIs (polygons) (ISS, 3d smFISH)
  6. [Blocked, segmentation format] A pipeline component to apply Affine transformations to ROIs (masks) (ISS, 3d smFISH) (solution for #410 and #683 )
  7. [Bonus, Spike] Is there an existing implementation that will allow learning of full affine transformations?
    1. See e.g. http://scikit-image.org/docs/dev/auto_examples/transform/plot_matching.html
  8. [Bonus] implement a method to learn Affine transformations (BaristaSeq, MExFISH)
    1. Requires matched points in source and destination images. See http://scikit-image.org/docs/dev/api/skimage.transform.html#estimate-transform
  9. [Bonus] Stitching: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2682522/pdf/btp184.pdf

Current implementation: starfish.image.registration

Notes from previous issues:

  1. Any registration method should zero-out any pixels that are "created" as a result of shifting an image. Our existing approach uses pixels that are shifted out-of-frame from the other side (see #408)
ttung commented 5 years ago

It's not clear our object model works well for stitching. We would need to pull pixels from one ImageStack to another.

Is there a reason we need to support the application of transforms to both images and spots?

ambrosejcarr commented 5 years ago

We would need to pull pixels from one ImageStack to another.

I think stitching would always require loading in two imagestacks at a time.

Is there a reason we need to support the application of transforms to both images and spots?

Am I interpreting properly that you'd prefer to only support transforms to spots?

ttung commented 5 years ago

Am I interpreting properly that you'd prefer to only support transforms to spots?

I don't have a preference. I am just stating that the two seem rather redundant. Transformations to pixels is more expensive to do, but honestly, I'm skeptical of the claim that it's a whole lot more expensive since you have to read the data to begin with.

I also think it might be worthwhile coming up with a standard model or file format for transformations.

Finally, would the transforms always be learned off of the images? Would it ever make sense to do it in spot-space?

shanaxel42 commented 5 years ago

currently waiting on https://github.com/scikit-image/scikit-image/pull/4023