reillytilbury / coppafish

Python version of iss software
MIT License
4 stars 2 forks source link

Register Documentation #356

Closed reillytilbury closed 2 months ago

reillytilbury commented 3 months ago

Registration

The register section is the part of the pipeline concerned with aligning different rounds and channels. This is crucial for decoding spot colours into genes in the Call Spots and OMP sections. The aim of this section is to find a function $g_{trc}(\mathbf{x})$ for each tile, round and channel that takes in a location $\mathbf{x}$ on the anchor image for tile $t$ and returns the corresponding location in round $r$, channel $c$ of the same tile. Since registration is done independently for each tile and we are often only working on one tile, we sometimes omit the subscript $t$ in this documentation.

Once we have these transformations $g$, we can get a $n{rounds} \times n{channels}$ spot colours matrix $\boldsymbol{\zeta}(\mathbf{x})$ for each location $\mathbf{x}$ in the anchor image of a given tile via

\boldsymbol{\zeta}(\mathbf{x}) = 

\begin{pmatrix}
f_{0, 0}(g_{0, 0}(\mathbf{x})) & \cdots & f_{0, n_c}(g_{0, n_c}(\mathbf{x})) \\
\vdots & \ddots & \vdots \\
f_{n_r, 0}(g_{n_r, 0}(\mathbf{x})) & \cdots & f_{n_r, n_c}(g_{n_r, n_c}(\mathbf{x})) \\
\end{pmatrix},

where $f_{rc}(\mathbf{x})$ is the intensity of round $r$, channel $c$ at location $\mathbf{x}$. Note that even a single poorly aligned round or channel makes this matrix difficult to decode, which highlights the importance of this section.

Algorithm

We need to consider a few questions when building the register pipeline. Some of the most important are:

  1. How large should our search space of functions for $g$ be? Are these functions independent between rounds and channels?
  2. How will we actually search this space?

1. Choosing the Function Space

To choose the set of functions we will use to fit to our data, we need to look for all sources of misalignment. Channel misalgnments are caused by:

So the channel-to-channel differences are composed of shifts, rotations and scalings. We model each channel transform by an affine transform $A_c$.


An example of chromatic aberration.

For round to round differences, we see much less predictable variability. Misalignments arise due to:

The conclusion is that affine transformations do not sufficiently capture the richness of round-to-round transformations. We therefore allow for completely arbitrary transformations $\mathcal{F}_r$ for each round $r$.


A rip in the tissue and two attempts at affine registration.

This also answers the second part of question 1 - we don't need to find $n{rounds} \times n{channels}$ transforms per tile, we only need $n{rounds} + n{channels}$. We model every transform on a given tile as a composition of a very general round transform $\mathcal{F}_r$ and an affine channel transform $A_c$ . Explicitly, we have

 g_{rc}(\mathbf{x}) = A_{c}(\mathcal{F}_{r}(\mathbf{x}).

It turns out that the $n_{channels}$ channel transforms, $Ac$, are rather straightforward to compute and the main difficulty lies in finding the $n{rounds}$ round transforms, $\mathcal{F}_r$.

2. Computing the Transforms

The round transforms are computed with Optical Flow, while the channel transforms are computed with Iterative Closest Point. For further details see the sections below.

Note: When we compute the round transforms $\mathcal{F}_r$ these often include some systematic error, like a small shift of 1 pixel and slight underestimation of the z-expansion. This is due to

To get around this, we find an affine correction $Br$ for each round, meaning our functions $g{rc}$ are actually of the form $g_{rc}(\mathbf{x}) = A_{c}(B_{r}(\mathcal{F}_{r}(\mathbf{x}))$ . We combine the two affine transforms $A_c$ and $Br$ to give us the simple formula $` g{rc}(\mathbf{x}) = A{rc}(\mathcal{F}{r}(\mathbf{x})`$ which means that in practice our affine maps actually depend on round as well as channel.

reillytilbury commented 3 months ago

Optical Flow

We use optical flow to align the anchor DAPI $D_{r_{ref}}(\mathbf{x})$ to the round $r$ DAPI $D_r(\mathbf{x})$. The output of this algorithm is a function $\mathcal{F}_r$ which satisfies the relation

D_{r_{ref}}(\mathcal{F}_r(\mathbf{x})) \approx D_r(\mathbf{x}).

How does it work

Suppose we have 2 images $I$ and $J$ which we'd like to register. This means for each position $\mathbf{x}$ in image $J$ we would like to find the shift $\mathbf{s}$ satisfying

I(\mathbf{x} + \mathbf{s}) = J(\mathbf{x}).

Assuming that this $\mathbf{s}$ is small and that the function $I$ is sufficiently smooth, we can approximate it by a linear function in this neighbourhood. Taylor expanding and rearranging yields:

\mathbf{s} \cdot \boldsymbol{\nabla} I(\mathbf{x}) \approx J(\mathbf{x}) -  I(\mathbf{x}),

which is called the flow equation, an underdetermined equation due to the fact that there are 3 unknowns (each component of $\mathbf{s}$). There are many different methods that exist to tackle this. The onewe use is called the Lucas-Kanade method, which assumes that all pixels in a small window of radius $r$ (which can be adjusted by changing the window_radius parameter and has default value 8) around the point $\mathbf{x}$ have the same shift $\mathbf{s}$. Since this method assumes that all pixels have the same shift within this window, the condition that $I$ is smooth is very important. Specifically, we need $I$ to be well approximated by its derivative, which means that its Hessian $\frac{\partial ^2 I}{\partial \mathbf{x}^2}$ cannot be too large.

Lucas-Kanade works as follows. Let $\mathbf{x}_1, \cdots, \mathbf{x}_n$ be all the points in this window. Then assuming these all have the same shift $\mathbf{s}$, we can gather the $n$ flow equations

\begin{pmatrix} \boldsymbol{\nabla} I(\mathbf{x}_1)^T \\ \vdots \\ \boldsymbol{\nabla} I(\mathbf{x}_n)^T  \end{pmatrix} \mathbf{s} = \begin{pmatrix} J(\mathbf{x}_1) - I(\mathbf{x}_1) \\ \vdots \\ J(\mathbf{x}_n) - I(\mathbf{x}_n)  \end{pmatrix}, 

which is now overdetermined! This is a better problem to have though, as the solution can be approximated by least squares. The

The above derivation makes the following assumptions

  1. The shift $\mathbf{s}$ is small,
  2. The images are smooth. Ie, the Hessian $\frac{\partial ^2 I}{\partial \mathbf{x}^2}$ is not too large,
  3. The images $I$ and $J$ have the same intensities, so that $I(\mathbf{x} + \mathbf{s}) = J(\mathbf{x})$.

To make sure we meet the assumptions we carry out the following steps:

  1. Shift size:

    • We apply an initial Phase Correlation to find any global shift between $I$ and $J$, and shift $I$ by this amount before starting optical flow. This way, optical flow only captures the deviations from the global shift.
    • We downsample the images $I$ and $J$ in $y$ and $x$ before registration, which reduces the relative size of the shift.
    • The implementation we use takes an iterative approach, meaning that after the initial flow field is found the algorithm runs again until some stopping criterion is met.
  2. Smoothness:

    • For practical reasons (speed and shift size), we need to downsample the images by a factor of 4 in $y$ and $x$ before registering. We perform the downsampling by taking the mean within each 4 by 4 sub-block as opposed to just extracting every 4th pixel in $y$ and $x$.


      Nearest Neighbour Downsampling (Left) vs Mean Block Downsampling (Right)

    • We smooth the images with a small Gaussian blur before registering. This needs to be done carefully because too much blurring decreases the resolution of the images and therefore the quality of the registration.

  3. To ensure we have similar intensity profiles we match the means of $I$ and $J$ in similar spatial locations.

Practical Considerations

Speed

Speed is an issue with this algorithm, because it needs to be run independently on so many pixels. We take the following steps to optimise it:

  1. As mentioned above, we downsample the images in $y$ and $x$. The amount of donwsampling is controlled by the config parameter sample_factor_yx which has default value 4 in both directions, meaning that the algorithm runs 16 times faster than it would without downsampling.
  2. We split the downsampled images into 16 subvolumes (4 in $y$ and 4 in $x$), and run optical flow in parallel on all of these independent subvolumes. The number of cores used can be adjusted by changing the flow_cores parameter though if left blank this will be computed automatically.

Interpolation

As mentioned previously, the algorithm assumes that the images have the same intensities. This condition is certainly satisfied near cell nuclei, where similar features exist in both images. Far from nuclei though, where all we have is noise, the 2 images have completely independent intensities. The result of this is that our flow fields tend to only give reliable results near nuclei, as shown below.


The shifts found by optical flow are only reliable in a small window around cell nuclei.

This is problematic, as a lot of our reads are found in between cell nuclei! We need to interpolate the values in these regions.

Suppose we have a flow field $\mathcal{F}$ that we would like to interpolate. We might go about it in the following way:

  1. Choose some locations $\mathbf{x}_1, \cdots, \mathbf{x}_n$ where we know the shifts computed, $\mathbf{s}_1, \cdots, \mathbf{s}_n$, are reliable
  2. Define the interpolated flow to be of the form
    \mathcal{F}_{interp}(\mathbf{x}) = \sum_i w(\mathbf{x}, \mathbf{x}_i) \mathbf{s}_i ,

    where the sum is over all sample points $\mathbf{x}_1, \cdots, \mathbf{x}_n$, and the weights $w(\mathbf{x}, \mathbf{x}_i)$ have the following properties: a. $\sum_i w(\mathbf{x}, \mathbf{x}_i) = 1$ for all $\mathbf{x},$ b. $w(\mathbf{x}, \mathbf{x}_i)$ is a decreasing function in $|\mathbf{x}|$ and $w(\mathbf{x}_i, \mathbf{x}_i) \approx 1$.

If these 2 properties are met then $\mathcal{F}_{interp}$ will be a weighted average of all the shifts $\mathbf{s}i$, and since the weights are decreasing, the value of the $` \mathcal{F}{interp} $ at each interpolation point $\mathbf{x}_i$ will be strongly weighted toward $ \mathbf{s}_i`$.

Do such weight functions exist? Can we construct them? Yes and yes! Define the function

K(\mathbf{x}, \mathbf{y}) = \exp \Bigg( -\frac{1}{2 \sigma^2} | \mathbf{x} - \mathbf{y}|^2 \Bigg),

then we can define the weights by

w(\mathbf{x}, \mathbf{x}_i) = \dfrac{K(\mathbf{x}, \mathbf{x}_i)}{\sum_j K(\mathbf{x}, \mathbf{x}_j)}.

It is easy to see that this satisfies both the desired properties for the weights.

Extension to Soft Threshold Interpolation

The above method works well, but having a hard threshold means that some points $\mathbf{x}_1, \cdots, \mathbf{x}_n$ are used while others are completely ignored. This can lead to undersampling. A better approach is to employ a soft threshold approach, where we use all points $\mathbf{x}_i$ in the flow image but weight their contributions by the quality of the match at $\mathbf{x}_i$, which we will call $\lambda(\mathbf{x}_i)$. This reduces to the previous form if $\lambda(\mathbf{x_i})$ is 1 for the points in our sampling set and 0 otherwise. We use a score of $\lambda(\mathbf{x}) = D_{r_{ref}}(\mathcal{F}_r(\mathbf{x})) D_r(\mathbf{x})$.

This results in an interpolation of the form

\mathcal{F}_{interp}(\mathbf{x}) = \sum_i \lambda(\mathbf{x}_i) w(\mathbf{x}, \mathbf{x}_i) \mathbf{s}_i ,

where the sum now ranges over all points in the image, and the weight functions are given by

w(\mathbf{x}, \mathbf{x}_i) = \dfrac{K(\mathbf{x}, \mathbf{x}_i)}{\sum_j \lambda(\mathbf{x}_j) K(\mathbf{x}, \mathbf{x}_j)}.

ADD IMAGE OF TILES BEFORE AND AFTER INTERP.

Note:

reillytilbury commented 3 months ago

Iterative Closest Point

We now attempt to find the affine corrections to the flows found earlier. As mentioned previously, this will be an affine transform for each tile $t$, round $r$ and channel $c$. Furthermore, it will be separated into 2 transforms:

We will often omit the tile subscript, but keep in the back of your mind that these transforms vary between tiles.

How does it work

Optical flow took in 2 images ($I$ and $J$ ) as inputs and returned 3 images of the same size as outputs (the flow in each direction $y$, $x$ and $z$). ICP differs in that it takes in 2 point-clouds as input and returns an affine transform $A$ as output.

Let $X = \begin{pmatrix} \mathbf{x}_1, \cdots, \mathbf{x}_m \end{pmatrix}$ be the base point cloud and $Y = \begin{pmatrix} \mathbf{y}_1, \cdots, \mathbf{y}_n \end{pmatrix}$ be the point cloud we are trying to match this to.

In our case $X$ is the set of anchor points and $Y$ is the set of points in a given round and channel. So for every point $\mathbf{y}_i$ in $Y$ we expect there to be a corresponding point $\mathbf{x}_{\beta(i)}$ in $X$ (the converse is not true). Estimating the matching $\beta$ between base and target points is the main difficulty in ICP. Some common methods to estimate this matching include:

  1. We let $\mathbf{x}_{\beta(i)}$ be the closest point from $X$ to the point $\mathbf{y}_i$,
  2. The same as 1. but if there is no point in $X$ within a certain radius $r$ of $\mathbf{y}_i$ we don't bother to find a match,
  3. The same as 2. but we allow different radii in $y$, $x$ and $z$. This is useful if we have some prior that the misalignment affects $y$ and $x$ more than $z$, as we typically do.
  4. The same as 1. but we remove outliers where the shift $\mathbf{y}_i - \mathbf{x}_{\beta(i)}$ seems to be very different from others in its vicinity.

We use approach 3. The parameters config parameters neighb_dist_thresh_yx and neighb_dist_thresh_z refer to $r_{yx}$ and $r_z$ respectively. Currently we think that ICP is correcting $xy$ more than $z$, so we have $r{z} < r{xy}$, but if this changes in the future then increasing this parameter will allow ICP to have greater impact on the $z$ transforms.

Once we have a matching $\beta$, we find an affine map $A$ minimising the loss function $L(A) = \sum_{i} | A \mathbf{x}_{\beta(i)} - \mathbf{y}_i |^2$, (where the sum is over all those elements in $Y$ that have been assigned a match) and then iterate this process of matching then minimising until some stopping criteria are met. We have the 2 following stopping criteria:

  1. If 2 consecutive matchings are identical $\beta_{t+1} = \beta_t$ then ICP gets stuck in an infinite loop, so we stop the iterations.
  2. The maximum number of iterations is set by icp_max_iter which is set to 50 by default.

In pseudocode this is as follows:

# Take in a set of m base points and n target points (m >> n) and align them via ICP.

# Args:
# X = m x 4 matrix of padded base positions
# Y = n x 3 matrix of target positions
# transform_initial refers to the 4 x 3 affine transform which we use as our starting guess
# epsilon = neighb_dist_thresh
# ||a|| refers to the norm of a vector a: ||a|| = sqrt(sum_i a_i ** 2)

# initialise neighb_prev and transform
transform, neighb_prev = transform_initial, None

# Update X according to our initial guess
X = X @ transform

# begin loop
for i in range(n_iter):

    # Find closest point in X to each point in Y
    neighb = [ argmin_k || X[k] - Y[j] || for j in range(n) ]

    # Remove these matches if they are above the neighb_dist_thresh.
    # In the real code there are different thresholds for xy and for z
    neighb = [ neighb [j] if || X[neighb[j]] - Y[j] || < epsilon, else None for j in range(n) ]

    # Terminate if neighb = neighb_prev for all points
    if neighb == neighb_prev:
        QUIT

    # Find transform_update minimising squared error loss
    transform_update = argmin_B sum_j || X[neighb[i]] @ B - Y[i] || ** 2

    # Update: 
    # - X by applying the correction transform,
    # - Our working estimate of transform,
    # - neighb_prev to the current neighb
    X = X @ transform_update 
    transform = transform_update  @ transform
    neighb_prev = neighb

Implementation

Let $X{r, c}$ be the $n{spots}(r,c) \times 3$ matrix of all the spots found on round $r$ channel $c$.

We begin by computing the round $r$ affine correction $Br$. We first adjust $` X{r{ref}, c{ref}} $ by the flow to yield $ \mathcal{F}r(X{r{ref}, c{ref}}) $, which should approximately put the anchor spots in round $r$ coordinates. We align these to the points $ X{r, c{ref}} `$. As a formula this reads as

B_r = \textrm{ICP} (\textrm{base} = \mathcal{F}_r(X_{r_{ref}, c_{ref}}), \quad \textrm{target} = X_{r, c_{ref}}).

This should capture any systematic affine errors in the flow field $\mathcal{F}_r$.

To compute the channel transforms $Ac$ we align the anchor points $` X{r{ref}, c{ref}} `$ with all points in channel $c$, regardless of round. We adjust these points to be in the anchor coordinate system. In a formula, this reads as

A_c = \textrm{ICP} (\textrm{base} = X_{r_{ref}, c_{ref}}, \quad  \textrm{target} = \bigcup _r B_r ^{-1} (\mathcal{F}_r ^{-1} (X_{r, c}))).

The inverse transforms are used above because we are going from round $r$ coordinates to round $r_{ref}$ coordinates, which is opposite to the way we computed the transforms.

The chain of transforms is captured in the figure below:


Chain of transformations learnt for each round and channel.

Note:

  1. ICP will not run on a tile, round, channel with too few spots. This threshold is set by icp_min_spots which has default value 100.