scDRS (single-cell disease-relevance score) is a method for associating individual cells in single-cell RNA-seq data with disease GWASs, built on top of AnnData and Scanpy.

Read the documentation: installation, usage, command-line interface (CLI), file formats, etc.

Check out instructions for making customized gene sets using MAGMA.

Reference

Zhang, Hou, et al. "Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data", Nature Genetics, 2022.

Versions

v1.0.3: development version. Fixing a bug of negative values of ct_mean when --adj-prop and --cov are on and there are genes extremely low expression; print --adj-prop info in scdrs compute-score; check p-value and z-score files that the gene column should have header GENE; force index in df_cov and df_score to be str; add --min-genes and --min-cells in CLI for customized filtering; adjustable FDR threshold for plot_group_stats https://github.com/martinjzhang/scDRS/pull/75.
v1.0.2: latest stable version. Bug fixes on scdrs.util.plot_group_stats; input checks in scdrs munge-gs and scdrs.util.load_h5ad.
Older versions
v1.0.1: stable version used in publication. Identical to v1.0.0 except documentation.
v1.0.0: stable version used in revision 1. Results are identical to v0.1 for binary gene sets. Changes with respect to v0.1:
- scDRS command-line interface (CLI) instead of .py scripts for calling scDRS in bash, including scdrs munge-gs, scdrs compute-score, and scdrs perform-downstream.
- More efficient in memory use due to the use of sparse matrix throughout the computation.
- Allow the use of quantitative weights.
- New feature --adj-prop for adjusting for cell type-proportions.
v0.1: stable version used in the initial submission.

Code and data to reproduce results of the paper

See scDRS_paper for more details (experiments folder is deprecated). Data are at figshare.

Download GWAS gene sets (.gs files) for 74 diseases and complex traits.
Download scDRS results (.score.gz and .full_score.gz files) for TMS FACS + 74 diseases/trait.

Older versions

Initial submission: GWAS gene sets and scDRS results.

Explore scDRS results via CELLxGENE

h5ad files compatible with CELLxGENE
Instructions on running CELLxGENE


110,096 cells from 120 cell types in TMS FACS	IBD-associated cells

scDRS scripts (deprecated)

NOTE: scDRS scripts are still maintained but deprecated. Consider using scDRS command-line interface instead.

scDRS script for score calculation

Input: scRNA-seq data (.h5ad file) and gene set file (.gs file)

Output: scDRS score file ({trait}.score.gz file) and full score file ({trait}.full_score.gz file) for each trait in the .gs file

h5ad_file=your_scrnaseq_data
cov_file=your_covariate_file
gs_file=your_gene_set_file
out_dir=your_output_folder

python compute_score.py \
    --h5ad_file ${h5ad_file}.h5ad\
    --h5ad_species mouse\
    --cov_file ${cov_file}.cov\
    --gs_file ${gs_file}.gs\
    --gs_species human\
    --flag_filter True\
    --flag_raw_count True\
    --n_ctrl 1000\
    --flag_return_ctrl_raw_score False\
    --flag_return_ctrl_norm_score True\
    --out_folder ${out_dir}

--h5ad_file (.h5ad file) : scRNA-seq data
--h5ad_species ("hsapiens"/"human"/"mmusculus"/"mouse") : species of the scRNA-seq data samples
--cov_file (.cov file) : covariate file (optional, .tsv file, see file format)
--gs_file (.gs file) : gene set file (see file format)
--gs_species ("hsapiens"/"human"/"mmusculus"/"mouse") : species for genes in the gene set file
--flag_filter ("True"/"False") : if to perform minimum filtering of cells and genes
--flag_raw_count ("True"/"False") : if to perform normalization (size-factor + log1p)
--n_ctrl (int) : number of control gene sets (default 1,000)
--flag_return_ctrl_raw_score ("True"/"False") : if to return raw control scores
--flag_return_ctrl_norm_score ("True"/"False") : if to return normalized control scores
--out_folder : output folder. Score files will be saved as {out_folder}/{trait}.score.gz (see file format)

scDRS script for downsteam applications

Input: scRNA-seq data (.h5ad file), gene set file (.gs file), and scDRS full score files (.full_score.gz files)

Output: {trait}.scdrs_ct.{cell_type} file (same as the new {trait}.scdrs_group.{cell_type} file) for cell type-level analyses (association and heterogeneity); {trait}.scdrs_var file (same as the new {trait}.scdrs_cell_corr file) for cell variable-disease association; {trait}.scdrs_gene file for disease gene prioritization.

h5ad_file=your_scrnaseq_data
out_dir=your_output_folder
python compute_downstream.py \
    --h5ad_file ${h5ad_file}.h5ad \
    --score_file @.full_score.gz \
    --cell_type cell_type \
    --cell_variable causal_variable,non_causal_variable,covariate\
    --flag_gene True\
    --flag_filter False\
    --flag_raw_count False\ # flag_raw_count is set to `False` because the toy data is already log-normalized, set to `True` if your data is not log-normalized
    --out_folder ${out_dir}

--h5ad_file (.h5ad file) : scRNA-seq data
--score_file (.full_score.gz files) : scDRS full score files; supporting use of "@" to match strings
--cell_type (str) : cell type column (supporting multiple columns separated by comma); must be present in adata.obs.columns; used for cell type-disease association analyses (5% quantile as test statistic) and detecting association heterogeneity within cell type (Geary's C as test statistic)
--cell_variable (str) : cell-level variable columns (supporting multiple columns separated by comma); must be present in adata.obs.columns; used for cell variable-disease association analyses (Pearson's correlation as test statistic)
--flag_gene ("True"/"False") : if to correlate scDRS disease scores with gene expression
--flag_filter ("True"/"False") : if to perform minimum filtering of cells and genes
--flag_raw_count ("True"/"False") : if to perform normalization (size-factor + log1p)
--out_folder : output folder. Score files will be saved as {out_folder}/{trait}.scdrs_ct.{cell_type} for cell type-level analyses (association and heterogeneity); {out_folder}/{trait}.scdrs_var file for cell variable-disease association; {out_folder}/{trait}.scdrs_var.{trait}.scdrs_gene file for disease gene prioritization. (see file format)

martinjzhang / scDRS

readme