yezhengSTAT / ADTnorm

ADTnorm normalizes the cell surface protein measurement of CITE-seq data, facilitating across batches and across studies data integration.
GNU General Public License v3.0
22 stars 4 forks source link

attempting batch correction with ADTnorm #9

Open abspangler13 opened 1 year ago

abspangler13 commented 1 year ago


I installed ADTnorm from github using remotes::install_github("yezhengSTAT/ADTnorm", build_vignettes = FALSE)

I have an ADT-seq dataset that was generated in 4 different batches/runs with multiple samples in each batch/run. I am seeing an effect of the run even after normalizing with the DSB method. For example:


I have tried using ADTnorm with various parameters, but still the data is very separated by run. Below is my code along with the resulting UMAP plots:

Option 1: `#### option 1 A316.VDJ <- readRDS(file = here::here("A316_final_vdj_all.rds")) save_outpath <- "/Users/spanglerab" run_name <- "ADTnorm_demoRun"

cell_x_adt <- t(, assay = "Prot", slot = "counts"))) cell_x_feature <-

cell_x_feature$sample = factor(cell_x_feature$run) cell_x_feature$batch = factor(cell_x_feature$run)

cell_x_adt_norm <- ADTnorm( cell_x_adt = cell_x_adt, cell_x_feature = cell_x_feature, save_outpath = save_outpath, study_name = run_name, save_intermediate_fig = TRUE )

A316.VDJ <- SetAssayData(A316.VDJ, assay="Prot",slot = "data",

DefaultAssay(A316.VDJ) <- "Prot" A316.VDJ <- ScaleData(A316.VDJ, features = rownames(A316.VDJ)) A316.VDJ <- RunPCA(A316.VDJ, assay = "Prot", slot = "data", features = rownames(A316.VDJ), = "apca") A316.VDJ <- RunUMAP(A316.VDJ, reduction = "apca", dims = 1:18, assay = "Prot", = "prot.umap", reduction.key = "protUMAP_", n.neighbors = 40, min.dist = 0.3, local.connectivity = 3, spread = 3) pdf(file = here::here("DimPlot_prot_UMAP_all_adt_norm.pdf")) DimPlot(A316.VDJ, reduction = "prot.umap", label = TRUE, = "run")`


Thanks for your help,


yezhengSTAT commented 1 year ago

Hello Abby, I am not sure if you are aware of this tutorial webpage: It contains a few typical examples in turning the other parameters. UMAP can help us diagnose if there are batch effects left in the data but not a good approach to guide us to tune the parameter. Instead, can you generate a few density plot of the protein markers and see if their peaks align before and after the ADTnorm normalization? Just like those figures in the tutorial website. From there, we can see how to tune the other parameters. ;)

Thanks, Ye

abspangler13 commented 1 year ago

Hi Ye,

Thanks for informing me about the tutorial page. Here are a few of the density plots from this dataset. It does appear that the peaks are aligned.


image image


image image


image image

Thanks for your help,


abspangler13 commented 1 year ago

Hi Ye, I'm going to test a few more parameters and get back to you shortly. You do not need to respond to my previous comment.

yezhengSTAT commented 1 year ago

I was about to say that the peak alignment looks clean to me, at least for the three markers you shared with me......Do you see a big discrepancy across runs in other markers? If not, I don't see why UMAP shows a big separation across runs.......Not sure if it is the reason, but can you try skipping the "ScaleData" step? For example, you may directly run PCA on the cell_x_adt matrix using "prcomp" and then umap using "umap". Plot the umap coordinate directly on the scatter plot.

Thanks, Ye