reillytilbury / coppafish

Python version of iss software
MIT License
4 stars 2 forks source link

Stitch Documentation #349

Closed reillytilbury closed 2 weeks ago

reillytilbury commented 1 month ago

The stitch section is the part of the pipeline responsible for creating a global coordinate system. This is done by looking at the overlapping regions between tiles and seeing how they have deviated from their expected positions.

Algorithm

The origin of each tile $t_i$, which we call $\mathbf{X}_i$ is the position of its top left corner in the global coordinate system. Each tile $t_i$ is given a nominal origin $\mathbf{\tilde{X}}_i$ given by

\mathbf{\tilde{X}}_i = T(1-r) \bigg( Y_i, X_i, 0 \bigg),

where

These origins are corrected by shifts $\mathbf{S}_i$ which capture small deviations from the nominal origins, ie: $\mathbf{X}_i = \mathbf{\tilde{X}}_i + \mathbf{S}_i$. These shifts are found as follows:

  1. Compute shift $\mathbf{v_{ij}}$ between the overlapping region of all adjacent tiles $t_i$ and $t_j$ using phase correlation. This would be $\mathbf{0}$ if there were no deviations from the expected origins.

  2. Assign each shift $\mathbf{v}_{ij}$ a score $\lambda_{ij}$. We use

    \lambda_{ij} = \mathrm{corr}_ {\mathbf{x}}(t_i(\mathbf{x - v_{ij}}), t_j(\mathbf{x}))^2.
  3. We typically have about twice as many independent shifts as tiles (one south and one east for each non-boundary tile). This means our problem is over-constrained and doesn't have an exact solution. We can get an approximate solution for each shifts $\mathbf{S_i}$ by minimizing the loss function

    L(\mathbf{S}) = \sum_{i, j \ \mathrm{neighb}} \lambda_{ij} |\mathbf{S}_i - \mathbf{S}_j - \mathbf{v_{ij}}|^2,

    which is just saying that the deviations from the nominal origins for tiles $i$ and $j$ should be close to the observed deviations $\mathbf{v{ij}}$, and moreover that we should care more about making these similar when the deviations $\mathbf{v{ij}}$ are high quality, ie: $\lambda_{ij}$ is large.

  4. Differentiating the quadratic equation above gives a linear equation $\mathbf{AS} = \mathbf{B}$. However, $\mathbf{A}$ is not invertible, so this does not have a unique solution (any common translation of all $\mathbf{S}_i$ is another solution). We therefore solve for $\mathbf{S}$ by minimizing a second loss function

    J(\mathbf{S}) = |\mathbf{AS} - \mathbf{B}|^2,

    which just says that we take the smallest of all the shifted solutions for $\mathbf{S}$. This could have also been achieved by adding a regularisation term $\beta ||\mathbf{S}||^2$ to the original loss function $L$.

Global Coordinates and Duplicate Spots

Each pixel $p$ has local coordinate $\mathbf{q}_p$ which we can convert to global coordinates $\mathbf{Q}_p$ by adding the origin of pixel $p$'s parent tile, ie: $\mathbf{Q}_p = \mathbf{q}_p + \mathbf{X}_{t(p)}$. This is useful for global viewing of the spots and is indeed what we see when looking at the main results viewer.

Another thing this is useful for is ensuring we don't double count spots at the boundary between tiles. This would be a problem for computation time and more importantly for ensuring that these double counted genes don't cause issues for later analysis (for example PCISeq).

The way we get around double counting is by only using the pixels $p$ on tile $t$ whose closest tile centre is tile $t$. This is illustrated in the figure below.

ADD FIGURE HERE

Diagnostics