wanglab-broad / spin

SPatial INtegration of spatially resolved transcriptomics datasets
GNU General Public License v3.0
13 stars 0 forks source link

SPIN: spatial integration of spatially resolved transcriptomics (SRT) data

Biorxiv badge ⬅️ manuscript
Zenodo badge ⬅️ data

SPIN is a simple, Scanpy-based implementation of the subsampling and smoothing approach described in the manuscript Mitigating autocorrelation during spatially resolved transcriptomics data analysis. It enables the alignment and analysis of transcriptionally defined tissue regions across multiple SRT datasets, regardless of morphology or experimental technology, using conventional single-cell tools. Here we include information regarding:

  1. A conceptual overview of the approach
  2. Package requirements
  3. Installation instructions
  4. Basic usage principles

For examples of downstream analysis (e.g. differentially expressed gene analysis and trajectory inference), see the tutorial notebook. For further details on SPIN parameters, import SPIN into Python as shown below and run help(spin).

1. Conceptual overview

Ultimately, this approach enables the application of conventional single-cell tools to spatial molecular features in SRT data, yielding regional analogies for each tool. For more details and examples, please refer to the manuscript and tutorial.

2. Requirements:

Software:

Data:

3. Installation

From GitHub:

pip install git+https://github.com/wanglab-broad/spin@main

Takes ~5 mins.

4. Usage

In Python:

Consider the marmoset and mouse data from the manuscript which we provide as a demo:

import scanpy as sc

adata_marmoset = sc.read(
    'data/marmoset.h5ad',
    backup_url='https://zenodo.org/record/8092024/files/marmoset.h5ad?download=1'
)
adata_mouse = sc.read(
    'data/mouse.h5ad',
    backup_url='https://zenodo.org/record/8092024/files/mouse.h5ad?download=1'
)

These datasets can be spatially integrated and clustered using spin. The batch_key argument corresponds to the name of a new column in adata.obs that stores the batch labels for each dataset. The batch_labels argument is a list of these batch labels in the same order as the input AnnDatas:

from spin import spin

adata = spin(
    adatas=[adata_marmoset, adata_mouse],
    batch_key='species',
    batch_labels=['marmoset', 'mouse'],
    resolution=0.7
)

This performs the following steps:

Note that spin can equivalently take as input a single AnnData containing multiple labeled batches. It can also take a single AnnData containing one batch for finding regions in a single dataset. For examples, see the tutorial.

The resulting region clusters can then be visualized using standard Scanpy functions:

# In physical space
sc.set_figure_params(figsize=(7,5))
sc.pl.embedding(adata, basis='spatial', color='region')

# In UMAP space
sc.set_figure_params(figsize=(4,4))
sc.pl.embedding(adata, basis='X_umap_spin', color='region')

Downstream analysis (e.g. DEG analysis, trajectory inference) can then be performed using standard Scanpy functions as well. For examples of downstream analysis, see the tutorial. For further details on the parameters of spin, import SPIN into Python as shown above and run help(spin).

From the shell:

SPIN can be executed from the shell using the spin command as shown below (the path is identified automatically; see spin_cli and pyproject.toml)

Shell submission requires a read path to the relevant dataset(s) as well as a write path for the output dataset. Otherwise, provide the same parameters you would when running in Python as above:

spin \
--adata_paths data/marmoset.h5ad data/mouse.h5ad \
--write_path data/marmoset_mouse_spin.h5ad \
--batch_key species \
--batch_labels marmoset mouse \
--resolution "0.7"

Just as when running in Python, a single AnnData containing multiple batches can be passed in instead, as well as just a single dataset containing a single batch.