jsxlei / SCALEX

Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
BSD 3-Clause "New" or "Revised" License
72 stars 18 forks source link
deep-learning online-integration scalex single-cell single-cell-genomics

Stars PyPI Documentation Status Downloads DOI

Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space

News

[2022-10-17] SCALEX is online at Nature Communications

Documentation

Tutorial

Installation

install from PyPI

pip install scalex

install from GitHub

install the latest develop version

pip install git+https://github.com/jsxlei/scalex.git

or git clone and install

git clone git://github.com/jsxlei/scalex.git
cd scalex
python setup.py install

SCALEX is implemented in Pytorch framework.
SCALEX can be run on CPU devices, and running SCALEX on GPU devices if available is recommended.

Getting started

SCALEX can both used under command line and API function in jupyter notebook
Please refer to the Documentation and Tutorial

1. API function

from scalex import SCALEX
adata = SCALEX(data_list, batch_categories)

Function of parameters are similar to command line options.
Output is a Anndata object for further analysis with scanpy.
data_list can be

batch_categories is optional, name of each batch, will be range from 0 to N-1 if not provided

2. Command line

Standard usage

SCALEX --data_list data1 data2 dataN --batch_categories batch_name1 batch_name2 batch_nameN 

--data_list: data path of each batch of single-cell dataset, use -d for short

--batch_categories: name of each batch, batch_categories will range from 0 to N-1 if not specified

Output

Output will be saved in the output folder including:

Other Common Usage

Use h5ad file storing anndata as input, one or multiple separated files

SCALEX --data_list <filename.h5ad>

Specify batch in anadata.obs using --batch_name if only one concatenated h5ad file provided, batch_name can be e.g. conditions, samples, assays or patients, default is batch

SCALEX --data_list <filename.h5ad> --batch_name <specific_batch_name>

Integrate heterogenous scATAC-seq datasets, add option --profile ATAC

SCALEX --data_list <filename.h5ad> --profile ATAC

Inputation simultaneously along with Integration, add option --impute, results are stored at anndata.layers['impute']

SCALEX --data_list <atac_filename.h5ad> --profile ATAC --impute True

Custom features through --n_top_features a filename contains features in one column format read

SCALEX --data_list <filename.h5ad> --n_top_features features.txt

Use preprocessed data --processed

SCALEX --data_list <filename.h5ad> --processed

Option

Help

Look for more usage of SCALEX

SCALEX.py --help 

Release notes

See the changelog.

Citation

Xiong, L., Tian, K., Li, Y., Ning, W., Gao, X., & Zhang, Q. C. (2022). Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space. Nature Communications, 13(1), 6118. https://doi.org/10.1038/s41467-022-33758-z