SPIN is a simple, Scanpy-based implementation of the subsampling and smoothing approach described in the manuscript Mitigating autocorrelation during spatially resolved transcriptomics data analysis. It enables the alignment and analysis of transcriptionally defined tissue regions across multiple SRT datasets, regardless of morphology or experimental technology, using conventional single-cell tools. Here we include information regarding:
For examples of downstream analysis (e.g. differentially expressed gene analysis and trajectory inference), see the tutorial notebook. For further details on SPIN parameters, import SPIN into Python as shown below and run help(spin)
.
Conventional single-cell analysis can identify molecular cell types by considering each cell individually.
However, it does not incorporate spatial information.
Arguably the simplest way to incorporate spatial information and identify molecular tissue regions is to spatially smooth gene expression features across neighboring cells in the tissue.
This can be done by setting the features of each cell to the average of its spatial neighborhood.
However, a problem arises when smoothed representations of each cell are compared to one another.
Physically adjacent cells will have almost identical neighborhoods and thus almost identical smoothed representations.
Thus, we end up with nearest neighbors in feature space that are just nearest neighbors in physical space.
Because conventional methods for downstream anlaysis rely on the nearest neighbors graph in feature space, this leads to reconstruction of physical space in latent space rather than representing the true underlying large scale molecular patterns.
Here, we implement an approach in which each cell's spatial neighborhood is randomly subsampled before averaging, allowing the exact neighborhood composition to vary while still maintaining the general molecular composition.
Ultimately, this approach enables the application of conventional single-cell tools to spatial molecular features in SRT data, yielding regional analogies for each tool. For more details and examples, please refer to the manuscript and tutorial.
pip
installing this package from GitHub. While it comes standard on most machines, those without it may encounter an xcrun: error
when following the installation instructions below. See here for simple instructions on how to install it.pyproject.toml
..h5ad
format.X
(both sparse and dense representations supported).obsm
(key can be specified with argument spatial_key
).obs
with column name batch_key
.pip install git+https://github.com/wanglab-broad/spin@main
Takes ~5 mins.
Consider the marmoset and mouse data from the manuscript which we provide as a demo:
import scanpy as sc
adata_marmoset = sc.read(
'data/marmoset.h5ad',
backup_url='https://zenodo.org/record/8092024/files/marmoset.h5ad?download=1'
)
adata_mouse = sc.read(
'data/mouse.h5ad',
backup_url='https://zenodo.org/record/8092024/files/mouse.h5ad?download=1'
)
These datasets can be spatially integrated and clustered using spin
. The batch_key
argument corresponds to the name of a new column in adata.obs
that stores the batch labels for each dataset. The batch_labels
argument is a list of these batch labels in the same order as the input AnnDatas:
from spin import spin
adata = spin(
adatas=[adata_marmoset, adata_mouse],
batch_key='species',
batch_labels=['marmoset', 'mouse'],
resolution=0.7
)
This performs the following steps:
integrate
:
adata.layers['smooth']
)adata.obsm['X_pca_spin']
)cluster
:
adata.obs['region']
)adata.obsm['X_umap_spin']
)Note that spin
can equivalently take as input a single AnnData containing multiple labeled batches. It can also take a single AnnData containing one batch for finding regions in a single dataset. For examples, see the tutorial.
The resulting region clusters can then be visualized using standard Scanpy functions:
# In physical space
sc.set_figure_params(figsize=(7,5))
sc.pl.embedding(adata, basis='spatial', color='region')
# In UMAP space
sc.set_figure_params(figsize=(4,4))
sc.pl.embedding(adata, basis='X_umap_spin', color='region')
Downstream analysis (e.g. DEG analysis, trajectory inference) can then be performed using standard Scanpy functions as well.
For examples of downstream analysis, see the tutorial.
For further details on the parameters of spin
, import SPIN into Python as shown above and run help(spin)
.
SPIN can be executed from the shell using the spin
command as shown below (the path is identified automatically; see spin_cli
and pyproject.toml
)
Shell submission requires a read path to the relevant dataset(s) as well as a write path for the output dataset. Otherwise, provide the same parameters you would when running in Python as above:
spin \
--adata_paths data/marmoset.h5ad data/mouse.h5ad \
--write_path data/marmoset_mouse_spin.h5ad \
--batch_key species \
--batch_labels marmoset mouse \
--resolution "0.7"
Just as when running in Python, a single AnnData containing multiple batches can be passed in instead, as well as just a single dataset containing a single batch.