yezhengSTAT / ADTnorm

ADTnorm normalizes the cell surface protein measurement of CITE-seq data, facilitating across batches and across studies data integration.
https://yezhengstat.github.io/ADTnorm/articles/ADTnorm-tutorial.html
GNU General Public License v3.0
22 stars 4 forks source link

attempting batch correction with ADTnorm #9

Open abspangler13 opened 1 year ago

abspangler13 commented 1 year ago

Hello,

I installed ADTnorm from github using remotes::install_github("yezhengSTAT/ADTnorm", build_vignettes = FALSE)

I have an ADT-seq dataset that was generated in 4 different batches/runs with multiple samples in each batch/run. I am seeing an effect of the run even after normalizing with the DSB method. For example:

image

I have tried using ADTnorm with various parameters, but still the data is very separated by run. Below is my code along with the resulting UMAP plots:

Option 1: `#### option 1 A316.VDJ <- readRDS(file = here::here("A316_final_vdj_all.rds")) save_outpath <- "/Users/spanglerab" run_name <- "ADTnorm_demoRun"

cell_x_adt <- t(as.data.frame(GetAssayData(A316.VDJ, assay = "Prot", slot = "counts"))) cell_x_feature <- A316.VDJ@meta.data

cell_x_feature$sample = factor(cell_x_feature$run) cell_x_feature$batch = factor(cell_x_feature$run)

cell_x_adt_norm <- ADTnorm( cell_x_adt = cell_x_adt, cell_x_feature = cell_x_feature, save_outpath = save_outpath, study_name = run_name, save_intermediate_fig = TRUE )

A316.VDJ <- SetAssayData(A316.VDJ, assay="Prot",slot = "data", new.data=t(cell_x_adt_norm))

DefaultAssay(A316.VDJ) <- "Prot" A316.VDJ <- ScaleData(A316.VDJ, features = rownames(A316.VDJ)) A316.VDJ <- RunPCA(A316.VDJ, assay = "Prot", slot = "data", features = rownames(A316.VDJ), reduction.name = "apca") A316.VDJ <- RunUMAP(A316.VDJ, reduction = "apca", dims = 1:18, assay = "Prot", reduction.name = "prot.umap", reduction.key = "protUMAP_", n.neighbors = 40, min.dist = 0.3, local.connectivity = 3, spread = 3) pdf(file = here::here("DimPlot_prot_UMAP_all_adt_norm.pdf")) DimPlot(A316.VDJ, reduction = "prot.umap", label = TRUE, group.by = "run") dev.off()`

image

Thanks for your help,

Abby

yezhengSTAT commented 1 year ago

Hello Abby, I am not sure if you are aware of this tutorial webpage: https://yezhengstat.github.io/ADTnorm/articles/ADTnorm-tutorial.html. It contains a few typical examples in turning the other parameters. UMAP can help us diagnose if there are batch effects left in the data but not a good approach to guide us to tune the parameter. Instead, can you generate a few density plot of the protein markers and see if their peaks align before and after the ADTnorm normalization? Just like those figures in the tutorial website. From there, we can see how to tune the other parameters. ;)

Thanks, Ye

abspangler13 commented 1 year ago

Hi Ye,

Thanks for informing me about the tutorial page. Here are a few of the density plots from this dataset. It does appear that the peaks are aligned.

P-CCR3

image image

P-CCR6

image image

CD-19

image image

Thanks for your help,

Abby

abspangler13 commented 1 year ago

Hi Ye, I'm going to test a few more parameters and get back to you shortly. You do not need to respond to my previous comment.

yezhengSTAT commented 1 year ago

I was about to say that the peak alignment looks clean to me, at least for the three markers you shared with me......Do you see a big discrepancy across runs in other markers? If not, I don't see why UMAP shows a big separation across runs.......Not sure if it is the reason, but can you try skipping the "ScaleData" step? For example, you may directly run PCA on the cell_x_adt matrix using "prcomp" and then umap using "umap". Plot the umap coordinate directly on the scatter plot.

Thanks, Ye