jaydu1 / scVAEIT

Variational autoencoder for single-cell integration and transfer learning.
MIT License
6 stars 0 forks source link

PyPI PyPI-Downloads

Variational autoencoder for multimodal mosaic integration and transfer learning

This repository contains implementations of scVAEIT for integration and imputation of multi-modal datasets. scVAEIT (Variational autoencoder for multimodal single-cell mosaic integration and transfer learning) was originally proposed by [Du22] for single-cell genomics data. scVAEIT is a deep generative model based on a variational autoencoder (VAE) with masking strategies, which can integrate and impute multi-modal single-cell data, such as single-cell DOGMA-seq, CITE-seq, and ASAP-seq data. scVAEIT has also been extended to impute single-cell proteomic data in [Moon24], though it is also applicable to other types of data. scVAEIT is implemented in Python, and an R wrapper is also available.

Check out the example folder for illustrations of how to use scVAEIT:

Example Language Notebooks
Imputation of ADT Python Badge imputation_1modality.ipynb
Imputation of RNA and ADT Python Badge imputation_2modalities.ipynb
Integration of RNA, ADT, and peaks Python Badge integration_3modalities.ipynb
Imputation of RNA R Badge imputation_scRNAseq.ipynb
Imputation of peptides R Badge imputation_peptide.ipynb

For preparing your own data to run scVAEIT, please read about:

Example Language Notebooks
Prepare input data Python Badge prepare_data_input.ipynb

Reproducibility Materials

The code for reproducing results in the paper [Du22] can be found in the folder Reproducibility materials. The large preprocessed dataset that contains DOGMA-seq, CITE-seq, and ASAP-seq data from GSE156478 can be accessed through Google Drive.

Dependencies

The package can be installed via PyPI:

pip install scVAEIT

Alternatively, the dependencies can be installed via the following commands:

mamba create --name tf python=3.9 -y
conda activate tf
mamba install -c conda-forge "tensorflow>=2.12, <2.16" "tensorflow-probability>=0.12, <0.24" pandas jupyter -y
mamba install -c conda-forge "scanpy>=1.9.2" matplotlib scikit-learn -y

If you are using conda, simply replace mamba above with conda.

The code is only tested on Linux and MacOS. If you are using Windows, installing the dependencies pip instead of conda is more convenient.

Prameters

Network parameters

In the example, basically, the network is operated in two levels of blocks:

We explain the parameters as below:

Hyperparameters

Some of the important hyperparameters are:

In our experiments, the results were not sensitive to the above parameters. So you can just use reasonable values as in the example, except the following parameter requires some care depending on your data:

References