PyTorch implementation of STELLAR, a geometric deep learning tool for cell-type discovery and identification in spatially resolved single-cell datasets. STELLAR takes as input annotated reference spatial single-cell dataset in which cells are assigned to their cell types, and unannotated spatial dataset in which cell types are unknown. STELLAR then generates annotations for the unannotated dataset. For a detailed description of the algorithm, please see our manuscript Annotation of Spatially Resolved Single-cell Data with STELLAR.
1. Python environment (Optional): We recommend using Conda package manager
conda create -n stellar python=3.8
source activate stellar
2. Pytorch: Install PyTorch. We have verified under PyTorch 1.9.1. For example:
conda install pytorch cudatoolkit=11.3 -c pytorch
3. Pytorch Geometric: Install PyTorch Geometric, follow their instructions. We have verified under Pyg 2.0. For example:
conda install pyg -c pyg
4. Other dependencies:
Please run the following command to install additional packages that are provided in requirements.txt.
pip install -r requirements.txt
Note: We tested STELLAR with NVIDIA GPU, Linux, Python3. In particular, on Ubuntu 16.04 with NVIDIA Geforce 2080 Ti GPU and 1T CPU memory. We additionally tested the code on macOS (Intel chip).
We implemented STELLAR model in a self-contained class. To make an instance and train STELLAR:
stellar = STELLAR(args, dataset)
stellar.train()
_, results = stellar.pred()
CODEX multiplexed imaging datasets used in STELLAR are made available at dryad. Our demo code assumes the data to be put under the folder ./data/
you create.
We provide several training examples with this repo:
python STELLAR_run.py --dataset Hubmap --num-heads 22
python STELLAR_run.py --dataset TonsilBE --num-heads 13 --num-seed-class 3
Memory usage and time:
We also provided a jupyter notebook demo.ipynb that shows example of running STELLAR on a downsampled dataset. Please consider to downsample more if there is a memory issue, but note that the performance of the model would degrade as the training data gets less. For users with limited memory and potentially limited access to GPU, please set the use-processed-graph
to True to load pre-processsed data and can finish with CPU in about 30 mins.
STELLAR expects graph as input. In our code, we construct a graph based on a predefined threshold, but STELLAR can work with any meaninfully constructed graph. To use your own dataset, you just need to initialize GraphDataset and give it to the input to our stellar function.
dataset = GraphDataset(labeled_X, labeled_y, unlabeled_X, labeled_edges, unlabeled_edges)
stellar = STELLAR(args, dataset)
Example for HuBMAP dataset is shown in load_hubmap_data function, and for Tonsil/BE dataset in load_tonsilbe_data. These examples demonstrate how to initialize these variables from a csv file.
If you find our code and research useful, please consider citing:
@article{stellar2022,
title={Annotation of spatially resolved single-cell data with STELLAR},
author={Brbi{\'c}, Maria and Cao, Kaidi and Hickey, John W and Tan, Yuqi and Snyder, Michael P and Nolan, Garry P and Leskovec, Jure},
journal={Nature Methods},
volume={19},
number={11},
pages={1411--1418},
year={2022},
publisher={Nature Publishing Group}
}