hongleir / SpaceFlow

MIT License
21 stars 8 forks source link

DOI

SpaceFlow: Identifying Multicellular Spatiotemporal Organization of Cells using Spatial Transcriptome Data

SpaceFlow is Python package for identifying spatiotemporal patterns and spatial domains from Spatial Transcriptomic (ST) Data. Based on deep graph network, SpaceFlow provides the following functions:

  1. Encodes the ST data into low-dimensional embeddings that reflecting both expression similarity and the spatial proximity of cells in ST data.
  2. Incorporates spatiotemporal relationships of cells or spots in ST data through a pseudo-Spatiotemporal Map (pSM) derived from the embeddings.
  3. Identifies spatial domains with spatially-coherent expression patterns.

Check out our paper (Ren et al., Nature Communications, 2022) for the detailed methods and applications.

SpaceFlow was developed in Python 3.7 with Pytorch 1.9.0. Specific package versions are available in requirements.txt. The marker gene identification analysis is performed using Scanpy 1.8.1 package. The cell-cell communication inference is performed through CellChat v1.1.3 in a R v4.1.2 environment.

Installation

1. Prepare environment

To install SpaceFlow, we recommend using the Anaconda Python Distribution and creating an isolated environment, so that the SpaceFlow and dependencies don't conflict or interfere with other packages or applications. To create the environment, run the following script in command line:

conda create -n spaceflow_env python=3.7

After create the environment, you can activate the spaceflow_env environment by:

conda activate spaceflow_env

2. Install Pytorch

Please install Pytorch that match your machine and environment first by following the instructions on : https://pytorch.org/get-started/locally/

Note that if you want to install Pytorch on a GPU machine, you need to install CUDA first, see guide here for installing CUDA https://developer.nvidia.com/cuda-downloads.

3. Install SpaceFlow

After successfully installed Pytorch with the version that >=1.9.0, install the SpaceFlow package using pip by:

pip install SpaceFlow

If the installation is still not successful, try to install the required packages in requirements.txt by:

pip install -r requirements.txt

Usage

Quick Start by Example (Jupyter Notebook)

We will use the mouse organogenesis ST data from (Lohoff, T. et al. 2022) generated by seqFISH to demonstrate the usage of SpaceFlow.

The data is available in squidpy package, so we first import the squidpy package and load the data. If squidpy is not installed. Please run pip install squidpy to install.

1. Import SpaceFlow and squidpy package

import squidpy as sq
import scanpy as sc
from SpaceFlow import SpaceFlow

2. Load the ST data from squidpy package

adata = sq.datasets.seqfish()
sc.pp.filter_genes(adata, min_cells=3)

3. Create SpaceFlow Object

We can create a SpaceFlow object through either anndata.AnnData object or the count matrix as input:

To construct SpaceFlow object by inputting an anndata.AnnData object:

sf = SpaceFlow.SpaceFlow(adata=adata)

Parameters:

To SpaceFlow object by raw count matrix:

sf = SpaceFlow.SpaceFlow(count_matrix=adata.X, spatial_locs=adata.obsm['spatial'], sample_names=adata.obs_names, gene_names=adata.var_names)

Parameters:

4. Preprocessing the ST Data

Next, we preprocess the ST data by run:

sf.preprocessing_data(n_top_genes=3000)

Parameters:

The preprocessing includes the normalization and log-transformation of the expression count matrix, the selection of highly variable genes, and the construction of spatial proximity graph using spatial coordinates. (Details see the preprocessing_data function in SpaceFlow/SpaceFlow.py)

5. Train the deep graph network model

We then train a spatially regularized deep graph network model to learn a low-dimensional embedding that reflecting both expression similarity and the spatial proximity of cells in ST data.

sf.train(spatial_regularization_strength=0.1, z_dim=50, lr=1e-3, epochs=1000, max_patience=50, min_stop=100, random_seed=42, gpu=0, regularization_acceleration=True, edge_subset_sz=1000000)

Parameters:

6. Domain segmentation of the ST data

After the model training, the learned low-dimensional embedding can be accessed through sf.embedding.

SpaceFlow will use this learned embedding to identify the spatial domains based on Leiden algorithm.

sf.segmentation(domain_label_save_filepath="./domains.tsv", n_neighbors=50, resolution=1.0)

Parameters:

7. Visualization of the annotation and the identified spatial domains

We next plot the spatial domains using the identified domain labels and spatial coordinates of cells.

sf.plot_segmentation(segmentation_figure_save_filepath="./domain_segmentation.pdf", colormap="tab20", scatter_sz=1., rsz=4., csz=4., wspace=.4, hspace=.5, left=0.125, right=0.9, bottom=0.1, top=0.9)

The expected output is:

Domain Segmentation

Parameters:

We can also visualize the expert annotation for comparison by:

import scanpy as sc
sc.pl.spatial(adata, color="celltype_mapped_refined", spot_size=0.03)

The expected output is:

Expert Annotation

8. Idenfify the spatiotemporal patterns of the ST data through pseudo-Spatiotemporal Map (pSM)

Next, we apply the diffusion pseudotime (dpt) algorithm to the learned spatially-consistent embedding to generate a pseudo-Spatiotemporal Map (pSM). This pSM represents a spatially-coherent pseudotime ordering of cells that encodes biological relationships between cells, such as developmental trajectories and cancer progression

sf.pseudo_Spatiotemporal_Map(pSM_values_save_filepath="./pSM_values.tsv", n_neighbors=20, resolution=1.0)

Parameters:

9. Visualization of the identified pseudo-Spatiotemporal Map (pSM)

We next visualize the identified pseudo-Spatiotemporal Map (pSM).

sf.plot_pSM(pSM_figure_save_filepath="./pseudo-Spatiotemporal-Map.pdf", colormap="roma", scatter_sz=1., rsz=4., csz=4., wspace=.4, hspace=.5, left=0.125, right=0.9, bottom=0.1, top=0.9)

The expected output is:

pSM

Parameters:

Please cite

Ren, Honglei, et al. "Identifying multicellular spatiotemporal organization of cells with SpaceFlow." Nature Communications 13.1 (2022): 1-14. https://www.nature.com/articles/s41467-022-31739-w

Contact

If you have any questions or found any issues, please contact: hongleir@uci.edu.