Normalizing and denoising protein expression data from droplet-based single cell profiling

Code to reproduce all manuscript results and figures

Mulè MP* , Martins AJ* , Tsang JS. Normalizing and denoising protein expression data from droplet based single cell profiling.

This analysis pipeline reproduces all results and figures reported in the paper above describing control experiments and statistical modeling used to characterize sources of noise protein expression data from droplet based single cell experiments (CITE-seq, ASAP-seq, TEA-seq, REAP-seq, Mission Bio Tapestri etc.). Based on these analysis we introduced the dsb R package for normalizing and denoising droplet-based surface protein data. The dsb method was developed in John Tsang’s Lab by Matt Mulè, Andrew Martins and John Tsang.

The R package dsb is hosted on CRAN
link to latest dsb release on CRAN
link to dsb Github repository
Data used below is available in analysis ready format at this figshare repository: https://doi.org/10.35092/yhjc.13370915

Instructions for analysis workflow
install packages used in analysis
Download starting data and add to data directory
dsb normalization on PBMC data from 20 donors
CLR normalization (across cells) on PBMC data from 20 donors R4.0.5
UMAP based on dsb normalized values
dsb vs CLR nk cluster comparison, manual gating, normalization distribution comparison
dsb technical component (dsb step II) robusness assessments
dsb ambient correction (dsb step I) with different definitions of empty droplets robusness assessments
Multi vs single batch normalization, µ1 background resampling robustness check
External 10X genomics data analysis: “NextGem”, “V3”, and “5 Prime” assays
dsb normalize protein data from Mission Bio tapestri platform
dsb normalization of TEA-seq data and dsb-based WNN multimodal clustering R4.0.5
dsb normalization of ASAP-seq data and dsb-based WNN multimodal clustering R4.0.5
dsb vs CLR (acorss cells) Normalization comparison: Differential expression, Gap Statistic R4.0.5
dsb vs CLR normalized values as input to WNN multimodal clustering: PBMC data from 20 donors R4.0.5
dataset summary statistic table

Instructions for analysis workflow.

All analysis included in this manuscript was run on a laptop with 16GB RAM. To run the analysis, 1) download the dsb_manuscript github repository
2) download data from the figshare link above and add the /data folder directly to the root directory containing the .Rproj file; this directory should now contain dsb_manuscript.Rproj, the files readme.md and readme.rmd, the directory V2 and the directory you just added, data.

One can view the commented code and run each script in each subdirectory as listed below, or source each R script in the order they appear below. No file paths need to be specified or changed. Each R script is self-contained, reading data from the /data folder and writing to figures or results files within each analysis subdirectory relative to the root directory using the R package here.

Note:
There are 2 R versions used throughout analysis, R 3.5.3 and R 4.0.5; analysis using R 4.0.5 are indicated in the table of contents above. Switching R versions can be done on most HPC systems through separately installed modules. If analyzing data locally, the R Switch tool can be used to facilitate R version switching.

install packages used in analysis

# 
pkgs3.5 = list('tidyverse', 'magrittr', 'mclust', 'Seurat', 
               'ggrepel', 'ggridges', 'pals', 'reticulate', 'umap')

lapply(pkgs3.5, function(x){ 
  tryCatch(library(x), 
         error = function(e){
           install.packages(pkgs =  x, repos = 'http://cran.us.r-project.org')
           library(x)
         })
  })

# Note, for the R 3.5 analysis for umap must have python virtual env for this particular pipeline
# python must be installed for example: 
virtualenv_create("r-reticulate")
virtualenv_install("r-reticulate", "umap-learn")
use_virtualenv("r-reticulate")
library(umap)
library(reticulate)
library(umap)
# (Umap can now be called directly from most single cell analysis software).

R packages used in this this analysis: R 4.0.5
The same packages are required as above (separately installed for R 4.0.5) in addition, the following packages in the R4 environment are required:

# switch to R 4.0.5, i.e. using R switch https://rud.is/rswitch/guide/
pkgs4.0 = c(pkgs3.5, 'data.table', 'factoextra', 'GEOquery')
BiocManager::install("variancePartition")
devtools::install_github("caleblareau/BuenColors")
lapply(pkgs4.0, function(x){ 
  tryCatch(library(x), 
         error = function(e){
           install.packages(pkgs =  x, repos = 'http://cran.us.r-project.org')
           library(x)
         })
  })

3) Download starting data and add to data directory

After downloading the repository located at https://github.com/niaid/dsb_manuscript, add the data folder to the repository at the top level (where the file .rproj is located). The structure of the starting data is shown in the tree diagram below. The R code below checks for files.

-dsb_normalization.Rproj	-data		-mission_bio_data
-X066-MP0C1W6_leukopak_perm-cells_tea_200M_adt_counts.csv

-X066-MP0C1W6_leukopak_perm-cells_tea_200M_cellranger-arc_per_barcode_metrics.csv

-GSM5123954_X066-MP0C1W6_leukopak_perm-cells_tea_200M_cellranger-arc_filtered_feature_bc_matrix.h5
			-well5

-X066-MP0C1W5_leukopak_perm-cells_tea_200M_adt_counts.csv

-GSM5123953_X066-MP0C1W5_leukopak_perm-cells_tea_200M_cellranger-arc_filtered_feature_bc_matrix.h5

-X066-MP0C1W5_leukopak_perm-cells_tea_200M_cellranger-arc_per_barcode_metrics.csv
			-well4

-X066-MP0C1W4_leukopak_perm-cells_tea_200M_cellranger-arc_per_barcode_metrics.csv

-GSM5123952_X066-MP0C1W4_leukopak_perm-cells_tea_200M_cellranger-arc_filtered_feature_bc_matrix.h5

-X066-MP0C1W4_leukopak_perm-cells_tea_200M_adt_counts.csv
			-well3

-X066-MP0C1W3_leukopak_perm-cells_tea_200M_adt_counts.csv

-X066-MP0C1W3_leukopak_perm-cells_tea_200M_cellranger-arc_per_barcode_metrics.csv

-GSM5123951_X066-MP0C1W3_leukopak_perm-cells_tea_200M_cellranger-arc_filtered_feature_bc_matrix.h5
		-asapseq
			-adt
				-featurecounts.mtx
				-featurecounts.genes.txt
				-featurecounts.barcodes.txt
			-barcodes
				-step3_ADThq.tsv
	-10x_rds
		-5prime_5k
			-filtered_feature_bc_matrix
				-features.tsv.gz
				-barcodes.tsv.gz
				-matrix.mtx.gz

-vdj_v1_hs_pbmc2_5gex_protein_filtered_feature_bc_matrix.tar
		-10x_pbmc5k_V3.rds
		-10x_pbmc10k_V3.rds
		-pbmc_10k_protein_v3 — Cell Ranger.htm
		-v3_10k
			-filtered_feature_bc_matrix
				-features.tsv.gz
				-barcodes.tsv.gz
				-matrix.mtx.gz
			-pbmc_10k_protein_v3_filtered_feature_bc_matrix.tar
		-nextgem_5k
			-filtered_feature_bc_matrix
				-features.tsv.gz
				-barcodes.tsv.gz
				-matrix.mtx.gz
			-raw_feature_bc_matrix
				-features.tsv.gz
				-barcodes.tsv.gz
				-matrix.mtx.gz

-5k_pbmc_protein_v3_nextgem_filtered_feature_bc_matrix.tar
		-10x_pbmc_5prime_5k.rds
		-v3_5k
			-5k_pbmc_protein_v3_filtered_feature_bc_matrix.tar
			-filtered_feature_bc_matrix
				-features.tsv.gz
				-barcodes.tsv.gz
				-matrix.mtx.gz
		-10x_pbmc5k_NextGem.rds
	-V2_data
		-background_data
			-adt_neg_full_list.rds
			-adt_neg_dmx.rds
			-adt_neg_dmx_list.rds
			-adt_neg_full_qc.rds
		-CITEseq_raw_PMID32094927_seurat2.4.rds
		-unstained_control_singlets.rds

Prior to running any analysis confirm the data are in the correct repository relative to the project root directory with the scripts below.

# confirm  *data/V2_Data/* and *data/background_data/*
suppressMessages(library(here))
# all STARTING data 
data_ = c(
  "CITEseq_raw_PMID32094927_seurat2.4.rds", 
  "unstained_control_singlets.rds"
)
stopifnot(data_ %in% list.files(here("data/V2_Data/")))

# background drop data 
data_ = c(
"adt_neg_dmx_list.rds",
"adt_neg_dmx.rds",
"adt_neg_full_list.rds",
"adt_neg_full_qc.rds"
)
stopifnot(data_ %in% list.files(here("data/V2_Data/background_data/")))

### public data directories

# mission bio 
data_ = c("AML-4-cell-line-multiomics-adt-counts.tsv")
stopifnot(data_ %in% list.files(here("data/mission_bio_data/")))
# 10X data 
data_ = c("10x_pbmc5k_V3.rds", "10x_pbmc5k_NextGem.rds",
          "10x_pbmc10k_V3.rds", "10x_pbmc_5prime_5k.rds")
stopifnot(data_ %in% list.files(here("data/10x_rds/")))
tenx_filtered_matrices = c('5prime_5k','v3_10k','v3_5k','nextgem_5k')
# filtereed matrices 
tenx_filtered_dir = list.dirs(path = here('data/10x_rds/'), full.names = FALSE)
stopifnot(tenx_filtered_matrices %in% tenx_filtered_dir)

#asapseq
asap_ = c('adt', 'barcodes')
asapseq_ = list.dirs(path = here('data/revision_data/asapseq/'), full.names = FALSE)
stopifnot(asap_ %in% asapseq_)

# teaseq 
tea_ = c('well6', 'well5', 'well4', 'well3','maitsig.csv')
teaseq = list.files(path = here('data/revision_data/tea_seq/'), full.names = FALSE)
stopifnot(tea_ %in% teaseq)

#asapseq source (https://github.com/caleblareau/asap_reproducibility) 
asap_ = c('adt', 'barcodes')
asapseq_ = list.dirs(path = here('data/revision_data/asapseq/'), full.names = FALSE)
stopifnot(asap_ %in% asapseq_)

Analysis

These can be run line by line in an active R session or by sourcing the script. As described above, file paths do not need to be changed.

Run dsb normalization on PBMC data from 20 individuals.

V2/dsb_normalize_cluster_pipeline/

# R 3.5.1 
source(here("V2/dsb_normalize_cluster_pipeline/1_dsb_normalize.r"))

Run CLR normalization (across cells) on PBMC data from 20 individuals.

Here switch to R 4.0.5, CLR transform across cells

# R 4.0.5 
source(here("V2/dsb_normalize_cluster_pipeline/1.1_calc_clr_across_cells_s4.r"))

The rest of this section uses R 3.5.1

UMAP based on dsb normalized values

reads data from:
V2/dsb_normalize_cluster_pipeline/generated_data/h1_d0_singlets_ann_Seurat2.4_dsbnorm.rds

# switch back to R 3.5
source(here("V2/dsb_normalize_cluster_pipeline/2_run_umap.r"))

dsb vs CLR comparison and dsb vs empty drop protein annotation figures and manual gating

comparison of normalized protein distributions vs values of proteins in empty drops. CLR vs dsb on nk cell cluster and unstained control vs stained cell normalization analysis.

source(here("V2/dsb_normalize_cluster_pipeline/3_figure_generation.r"))
source(here("V2/dsb_normalize_cluster_pipeline/4_manual_gate_plots.r"))

stained vs unstained distribution normalization comparisons

R 4.0.5
the versions shown in figure are each batch with a separate dsb normalization applied; results are highly concordant with single or multibatch implementtions of dsb-also see related results in Supplementary Fig 8 which compares batch merge differences unbiasedly across proteins also using multiple definitions of background droplets (in the underlying dsb process analysis in dsb_process_plots, next section).

# multiple methods for joining batches and normalizing are shown 
source(here("V2/dsb_normalize_cluster_pipeline/5_norm_distribution_comparison.r"))

dsb technical component (step II) robusness assessments

(R 3.5.1)
This section illustrates the dsb process in steps and models each step underlying the method. 6 and 6a analyze the correlation structure of variables comprising the per cell technical component that is used in step II of the dsb method. In 8, the per-cell two component Gaussian mixture model is compared to k = 1, 3, 4, 5, and 6 component models across cells and the BICs from the resulting 169,374 models are analyzed. (R 3.5.1)

source(here("V2/dsb_process_plots/6_mean_isotype_v_mean_control.R"))
source(here("V2/dsb_process_plots/6a_isotype_figure_generation.r"))
# script 8 can be run before script 7 for a more coherent workflow. 
source(here("V2/dsb_process_plots/8_mixture_fits.r"))

dsb ambient correction (dsb step I) with different definitions of empty droplets robusness assessments

This section is an assessment of the robustness of the ambient correction step (dsb step I). Correlation between ‘ground truth’ background from unstained control cells spiked into the cell pool after staining cells but prior to droplet generation is compared to multiple definitions of empty droplets as well as model-derived background from a per protein mixture model.

source(here("V2/dsb_process_plots/7_neg_control_plots.R"))

Multi vs single batch normalization, µ1 background resampling robustness check

empty_drop_threshold_batch is an assessment of single vs multi batch normalization and sensitivity of each normalization scheme to defining background with hashing or library size distribution. mu1_noise_correlations is analysis of the robustness of µ1 background assessment and correlation with µ2 and isotype control means using 100 random samples of 4 µ1 proteins (the same number as isotype controls in the experiment) from each cell.

source(here("V2/parameter_sensitivity/empty_drop_threshold_batch.r"))
source(here("V2/parameter_sensitivity/mu1_noise_correlations.r"))

External 10X genomics data analysis: “NextGem”, “V3”, and “5 Prime” assays

These scripts are identical for each data set with tuned parameters at the beginning of each script. Read Cell Ranger raw output, select negative drops, run DSB normalization, run each normalization modeling step separately, cluster cells and plot distributions across clusters and on biaxial gates. Test underlying modeling assumptions for each external dataset.

# 10K v3 data 
source(here("V2/10x_analysis/10x_pbmc_10k_V3.r"))
source(here("V2/10x_analysis/10x_pbmc_10k_V3_figure_generation.r"))
# 5k V3 data 
source(here("V2/10x_analysis/10x_pbmc_5k_V3.r"))
source(here("V2/10x_analysis/10x_pbmc_5k_V3_figure_generation.r"))
# 5 prime data 
source(here("V2/10x_analysis/10x_pbmc_5prime_5k.r"))
source(here("V2/10x_analysis/10x_pbmc_5prime_5k_figure_generation.r"))
# Next Gem data 
source(here("V2/10x_analysis/10x_pbmc_NextGem.r"))
source(here("V2/10x_analysis/10x_pbmc_NextGem_figure_generation.r"))

dsb normalize protein data from Mission Bio tapestri platform

The data downloaded from MissionBio are reformatted for dsb and normalized using dsb step I - ambient correction using empty droplets.

# tapestri example data dsb normalization 
source(here("V2/missionbio_tapestri/tapestri_exampledata_analysis.r"))

dsb normalization of TEA-seq data and dsb-based WNN multimodal clustering

R 4.0.5 TEA-seq data were downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM5123951.

Preprocessing script formats data object using cells meeting authors internal qc. Weighted Nearest Neighbor joint mRNA and protein clustering is compared using dsb normalized and CLR (across cells) with the normalized ADT data directly as input. Analysis of dsb modeling assumptions.

# run custom mapping script provided by Lucas Graybuck
source(here('V2/teaseq/1_teaseq_preprocess_barcodes.r'))

# run customized joint protein RNA clustering based on Seurat 4 WNN with CLR and dsb for protein norm.   
source(here("V2/teaseq/SR4_teaseq_pipeline_V2.r"))

# figure generation
source(here("V2/teaseq/teaseq_figgen_V3.R"))

# dsb modeling assumptions 
source(here('V2/teaseq/teaseq_dsb_model.r'))

dsb normalization of ASAP-seq data and dsb-based WNN multimodal clustering

this analysis uses R version 4.0.5 and Seurat version 4.0.1

ASAP-seq data was downloaded from https://github.com/caleblareau/asap_reproducibility/tree/master/bonemarow_asapseq/data
See: https://www.nature.com/articles/s41587-021-00927-2

# run dsb on ASAP-seq data 
source(here('V2/asapseq/asapseq_bm.R'))
# Figure generation 
source(here('V2/asapseq/asapseq_bm_figure_generation.r'))

dsb vs CLR (acorss cells) Normalization comparison: Differential expression, Gap Statistic

Comparison of dsb with the updated implementation of CLR normalization (across cells).

# Gap statistic for cluster quality 
source(here('V2/si/gap.r'))

# differential expression of cluster protein markers 
source(here('V2/si/de.r'))

dsb vs CLR normalized values as input to WNN multimodal clustering: PBMC data from 20 donors

# wnn analysis of CITE-seq data normalized wit h CLR vs dsb 
source(here('V2/joint_clustering/SR4_seurat_wnn_dsb.R'))
source(here('V2/joint_clustering/SR4_seurat_wnn_dsb_figgen.r'))

dataset summary statistic table

# create summary statistics table for analyzed datasets. 
source(here('V2/table/make_table.r'))

public dataset sources

Data are available in analysis-ready format at the figshare link above for convenience. For 10X data, downloaded only the raw, not the filtered data in the links below downloaded feature / cell matrix (raw) output. dsb uses the non cell containing empty droplets that are in the raw output to estimate background - see package documentation.
Next GEM: https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_pbmc_protein_v3_nextgem
V3 (5K): https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_pbmc_protein_v3
V3 (10K): https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_protein_v3
5’ https://support.10xgenomics.com/single-cell-vdj/datasets/3.0.0/vdj_v1_hs_pbmc2_5gex_protein
Mission Bio Tapestri: https://portal.missionbio.com/datasets/4-cell-lines-AML-multiomics
download the following text file ‘AML-4-cell-line-multiomics-adt-counts.tsv’

initialization script for public datasets

This code is here for reference. Conversion of public 10X genomics datasets to .rds files.

suppressMessages(library(Seurat)) 
suppressMessages(library(tidyverse))
suppressMessages(library(here))
source(here("V2/functions/preprocessing_functions.R"))

# Read10X_MPM is equivalent to Read10X in Seurat version 3 (reads compressed cell ranger data)
raw1 = Read10X_MPM("data/10x_data/10x_pbmc10k_V3/raw_feature_bc_matrix/")
raw2 = Read10X_MPM("data/10x_data/10x_pbmc5k_V3/raw_feature_bc_matrix/")
raw3 = Read10X_MPM("data/10x_data/10x_pbmc_5prime_5k/raw_feature_bc_matrix/")
raw4 = Read10X_MPM("data/10x_data/10x_pbmc5k_NextGem/raw_feature_bc_matrix/")

# save 
dir.create(here("data/10x_rds/"))
saveRDS(raw1,file = here("data/10x_rds/10x_pbmc10k_V3.rds"))
saveRDS(raw2,file = here("data/10x_rds/10x_pbmc5k_V3.rds"))
saveRDS(raw3,file = here("data/10x_rds/10x_pbmc_5prime_5k.rds"))
saveRDS(raw4,file = here("data/10x_rds/10x_pbmc5k_NextGem.rds"))

initialization script for healthy donor 53k cell CITE-seq data.

The code below was run outside of this workflow for convenience, it is shown here for reference. This step reformats the dataset hosted at the figshare repository that is associated with the manuscript. The script below removes the protein normalized data slot which used an earlier version of the DSB package for normalization. In addition, normalized RNA data, metadata, clustering snn graphs and tSNE results are removed to reduce object size and cell type annotations as reported in the paper linked above are added to the object as metadata.

suppressMessages(library(Seurat)) 
suppressMessages(library(tidyverse))
suppressMessages(library(magrittr))
suppressMessages(library(here))

datapath = here("data/V2_Data/")

# clear unneeded data slots tsne cell embeddings, unneeded metadata 
h1 = readRDS(file = "data/V2_Data/H1_day0_scranNorm_adtbatchNorm_dist_clustered_TSNE_labels.rds")
h1@dr$protpca = NULL
h1@dr$tsne_p10 = NULL
h1@dr$tsne_p50 = NULL
h1@dr$tsne_p90 = NULL
h1@dr$tsne_p130 = NULL
h1@dr$tsne_p170 = NULL

h1@assay$HTO = NULL
h1@assay$CITE@data = NULL
h1@assay$CITE@scale.data = NULL
h1@data = NULL
# remove unused cells RNA data from raw slot
h1 = h1 %>% SubsetData(subset.raw = TRUE)

# remove unused additional metadata 
mds = read_delim(file = here("init_data_ignore/md_strip.txt"),delim = "\t")
vrm = mds$vars_
for (i in 1:length(vrm)) {
  vr = vrm[i]
  h1@meta.data[[vr]] = NULL
}

# add cluster labels from Fig 6 of https://www.nature.com/articles/s41591-020-0769-8 for reference in dsb paper.
cmd = read_delim(file = paste0(datapath, "clustree_node_labels_withCellTypeLabels.txt"), delim = '\t')
cmd1 = cmd %>% filter(clustering == "p3_dist_1")
cmd3 = cmd %>% filter(clustering == "p3_dist_3")

mdn = h1@meta.data %>% 
  mutate(celltype_label_1 = plyr::mapvalues(x = h1@meta.data$p3_dist_1, from = cmd1$cluster, to =cmd1$`Cell Type label`)) %>% 
  mutate(celltype_label_3 = plyr::mapvalues(x = h1@meta.data$p3_dist_3, from = cmd3$cluster, to =cmd3$`Cell Type label`)) %>% 
  select(barcode_check, celltype_label_3, celltype_label_1) %>% 
  column_to_rownames("barcode_check")

h1 = h1 %>% AddMetaData(metadata = mdn)

# save this starting object for DSB paper analysis  
saveRDS(h1, file = paste0(datapath, "CITEseq_raw_PMID32094927_seurat2.4.rds"))

Project options and Sessioninfo

R version 3.5.3

other attached packages: mclust_5.4.5 reticulate_1.12 umap_0.2.3.1 magrittr_1.5 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.5 purrr_0.3.3 readr_1.3.1 tidyr_1.0.2 tibble_2.1.1 tidyverse_1.2.1 Seurat_2.3.4 Matrix_1.2-15 cowplot_0.9.4 ggplot2_3.1.1 here_0.1

loaded via a namespace (and not attached): readxl_1.3.1 snow_0.4-3 backports_1.1.4 Hmisc_4.2-0 plyr_1.8.4 igraph_1.2.4.1 lazyeval_0.2.2 splines_3.5.3 inline_0.3.15 digest_0.6.25 foreach_1.4.4 htmltools_0.3.6 lars_1.2 rsconnect_0.8.16 gdata_2.18.0 checkmate_1.9.3 cluster_2.0.7-1 mixtools_1.1.0 ROCR_1.0-7 modelr_0.1.4 matrixStats_0.54.0 R.utils_2.8.0 askpass_1.1 prettyunits_1.0.2 colorspace_1.4-1 rvest_0.3.4 haven_2.1.0 xfun_0.7 callr_3.2.0 crayon_1.3.4 jsonlite_1.6 survival_2.43-3 zoo_1.8-6 iterators_1.0.10 ape_5.3 glue_1.3.1 gtable_0.3.0 pkgbuild_1.0.3 kernlab_0.9-27 rstan_2.19.3 prabclus_2.3-1 DEoptimR_1.0-8 scales_1.0.0 mvtnorm_1.0-10 bibtex_0.4.2 Rcpp_1.0.1 metap_1.1 dtw_1.20-1 htmlTable_1.13.1 foreign_0.8-71 bit_1.1-14 proxy_0.4-23 SDMTools_1.1-221.1 Formula_1.2-3 stats4_3.5.3 tsne_0.1-3 StanHeaders_2.21.0-1 htmlwidgets_1.3 httr_1.4.0 gplots_3.0.1.1 RColorBrewer_1.1-2 fpc_2.2-1 acepack_1.4.1 modeltools_0.2-22 ica_1.0-2 pkgconfig_2.0.2 loo_2.3.1 R.methodsS3_1.7.1 flexmix_2.3-15 nnet_7.3-12 tidyselect_0.2.5 rlang_0.4.5 reshape2_1.4.3 cellranger_1.1.0 munsell_0.5.0 tools_3.5.3 cli_1.1.0 generics_0.0.2 broom_0.5.2 ggridges_0.5.1 evaluate_0.14 yaml_2.2.0 npsurv_0.4-0 processx_3.3.1 knitr_1.23 bit64_0.9-7 fitdistrplus_1.0-14 robustbase_0.93-5 caTools_1.17.1.2 RANN_2.6.1 packrat_0.5.0 pbapply_1.4-0 nlme_3.1-137 R.oo_1.22.0 xml2_1.2.0 hdf5r_1.2.0 compiler_3.5.3 rstudioapi_0.10 png_0.1-7 lsei_1.2-0 stringi_1.4.3 ps_1.3.0 lattice_0.20-38 vctrs_0.2.4 pillar_1.4.1 lifecycle_0.1.0 Rdpack_0.11-0 lmtest_0.9-37 data.table_1.12.2 bitops_1.0-6 irlba_2.3.3 gbRd_0.4-11 R6_2.4.0 latticeExtra_0.6-28 KernSmooth_2.23-15 gridExtra_2.3 codetools_0.2-16 MASS_7.3-51.1 gtools_3.8.1 assertthat_0.2.1 openssl_1.4 rprojroot_1.3-2 withr_2.1.2 diptest_0.75-7 parallel_3.5.3 doSNOW_1.0.16 hms_0.4.2 grid_3.5.3 rpart_4.1-13 class_7.3-15 rmarkdown_1.13 segmented_0.5-4.0 Rtsne_0.15 lubridate_1.7.4 base64enc_0.1-3

R version 4.0.5

other attached packages:
[1] mclust_5.4.7 GEOquery_2.58.0 Biobase_2.50.0 BiocGenerics_0.36.1 magrittr_2.0.1 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.4
[9] purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.6 ggplot2_3.3.3 tidyverse_1.3.0 dsb_0.1.0 here_1.0.1
[17] Matrix_1.3-2 data.table_1.14.0 SeuratObject_4.0.0 Seurat_4.0.1

loaded via a namespace (and not attached):
[1] Rtsne_0.15 colorspace_2.0-0 deldir_0.2-10 ellipsis_0.3.1 ggridges_0.5.3 rprojroot_2.0.2 fs_1.5.0
[8] rstudioapi_0.13 spatstat.data_2.1-0 farver_2.0.3 leiden_0.3.7 listenv_0.8.0 ggrepel_0.9.1 lubridate_1.7.9.2
[15] xml2_1.3.2 codetools_0.2-18 splines_4.0.5 polyclip_1.10-0 jsonlite_1.7.2 broom_0.7.5 ica_1.0-2
[22] cluster_2.1.2 dbplyr_2.1.0 png_0.1-7 pheatmap_1.0.12 uwot_0.1.10 shiny_1.6.0 sctransform_0.3.2
[29] spatstat.sparse_2.0-0 compiler_4.0.5 httr_1.4.2 backports_1.2.1 assertthat_0.2.1 fastmap_1.1.0 lazyeval_0.2.2
[36] cli_2.5.0 limma_3.46.0 later_1.1.0.1 htmltools_0.5.1.1 tools_4.0.5 igraph_1.2.6 gtable_0.3.0
[43] glue_1.4.2 RANN_2.6.1 reshape2_1.4.4 Rcpp_1.0.6 scattermore_0.7 cellranger_1.1.0 vctrs_0.3.6
[50] nlme_3.1-152 lmtest_0.9-38 globals_0.14.0 rvest_0.3.6 mime_0.10 miniUI_0.1.1.1 lifecycle_1.0.0
[57] irlba_2.3.3 goftest_1.2-2 future_1.21.0 MASS_7.3-53.1 zoo_1.8-8 scales_1.1.1 spatstat.core_2.0-0
[64] hms_1.0.0 promises_1.2.0.1 spatstat.utils_2.1-0 RColorBrewer_1.1-2 reticulate_1.18 pbapply_1.4-3 gridExtra_2.3
[71] rpart_4.1-15 stringi_1.5.3 rlang_0.4.10 pkgconfig_2.0.3 matrixStats_0.58.0 lattice_0.20-41 ROCR_1.0-11
[78] tensor_1.5 labeling_0.4.2 patchwork_1.1.1 htmlwidgets_1.5.3 cowplot_1.1.1 tidyselect_1.1.0 ggsci_2.9
[85] parallelly_1.23.0 RcppAnnoy_0.0.18 plyr_1.8.6 R6_2.5.0 generics_0.1.0 DBI_1.1.1 withr_2.4.1
[92] pillar_1.4.7 haven_2.3.1 mgcv_1.8-34 fitdistrplus_1.1-3 survival_3.2-10 abind_1.4-5 future.apply_1.7.0
[99] modelr_0.1.8 crayon_1.4.1 KernSmooth_2.23-18 spatstat.geom_2.0-1 plotly_4.9.3 viridis_0.5.1 grid_4.0.5
[106] readxl_1.3.1 reprex_1.0.0 digest_0.6.27 xtable_1.8-4 httpuv_1.5.5 munsell_0.5.0 viridisLite_0.3.0

NIAID repository release notes

A review of this code has been conducted, no critical errors exist, and to the best of the authors knowledge, there are no problematic file paths, no local system configuration details, and no passwords or keys included in this code.
Primary author(s): Matt Mulè
Organizational contact information: General: john.tsang AT nih.gov, code: mulemp AT nih.gov Date of release: initial release: Oct 7 2020
Description: code to reproduce analysis of manuscript
Usage instructions: Provided in this markdown

code check

Checked repository for PII and strings containing file paths. Data used in this analysis does not contain PII.

library(lintr)
fcn = suppressMessages(list.files(here("functions"), ".r", full.names = TRUE))
pipel = suppressMessages(list.files(here("V2/dsb_normalize_cluster_pipeline/"),pattern = ".r", full.names = TRUE))
process = suppressMessages(list.files(here("V2/dsb_process_plots/"),pattern = ".r", full.names = TRUE))
tenx = suppressMessages(list.files(here("V2/10x_analysis/"),".r", full.names = TRUE))
mb = suppressMessages(list.files(here("V2/missionbio_tapestri/"), ".r", full.names = TRUE))
param = suppressMessages(list.files(here("V2/parameter_sensitivity/"),pattern = ".r", full.names = TRUE))

# code check
scp = c(fcn, tenx, mb, pipel, process, param) %>% as.list()
lt = suppressMessages(lapply(scp, lintr::lint))

niaid / dsb_manuscript

readme