satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.18k stars 892 forks source link

Data Integration + Batch Effect Removal #9002

Open jcasadogp opened 4 weeks ago

jcasadogp commented 4 weeks ago

Hello,

I have a question of how to do both data integration and batch effect removal. I have previously posted this issue where they introduced me in Data Integration.

I have a Seurat Object with these metadata fields: LibraryPreparationChip and Treatment. I want to remove the effect of LibraryPreparationChip and do Data Integration for Treatment in order to compare them. If I understood correctly, batch effect removal with methods like harmony will eliminate the effect of that parameter while data integration just remodels the data, but do not eliminate the effect of that parameter, right?

I have a mixture of codes trying to find the appropriate combination:

seurat.merged[["RNA"]] <- JoinLayers(seurat.merged[["RNA"]])
seurat.merged[["RNA"]] <- split(seurat.merged[["RNA"]], f = seurat.merged$LibraryPreparationChip)

seurat.merged <- NormalizeData(seurat.merged, normalization.method = "LogNormalize", scale.factor = 10000)
seurat.merged <- FindVariableFeatures(seurat.merged, selection.method = "vst",nfeatures=2000)
seurat.merged <- ScaleData(seurat.merged)

seurat.merged <- RunPCA(seurat.merged, verbose = FALSE, npcs = 40)
seurat.merged <- RunUMAP(seurat.merged, dims = 1:40, reduction = "pca", verbose = FALSE)

# data integration using the HarmonyIntegration method
seurat.merged <- IntegrateLayers(object = seurat.merged, method = HarmonyIntegration, orig.reduction = "pca", new.reduction = 'harmony', assay = "RNA", verbose = FALSE)

seurat.merged <- RunUMAP(seurat.merged, reduction = "harmony", dims = 1:25, reduction.name = "umap.harmony")

seurat.merged[["RNA"]] <- JoinLayers(seurat.merged[["RNA"]])

Up to this point, if I am right, I eliminated the LibraryPreparationChip effect; how should I continue to do data integration of Treatment?

seurat.merged[["RNA"]] <- split(seurat.merged[["RNA"]], f = seurat.merged$Treatment)

# Run standard analysis workflow
cat("A. (1/6) => Run standard analysis workflow", sep = "\n")
seurat.merged <- NormalizeData(seurat.merged)
seurat.merged <- FindVariableFeatures(seurat.merged)
seurat.merged <- ScaleData(seurat.merged)

seurat.merged.pca <- seurat.merged
seurat.merged.harmony <- seurat.merged

# PCA
cat("A. (3/6) => PCA", sep = "\n")
seurat.merged <- RunPCA(seurat.merged, verbose = TRUE, reduction.name = "pca.2")

# After preprocessing, we integrate layers.
cat("A. (4/6) => Integrate layers and re-join them", sep = "\n")
seurat.merged <- IntegrateLayers(object = seurat.merged, method = CCAIntegration, 
                                 orig.reduction = "pca", new.reduction = "integrated.pca.cca", 
                                 verbose = TRUE)

seurat.merged <- IntegrateLayers(object = seurat.merged, method = CCAIntegration, 
                                 orig.reduction = "pca.2", new.reduction = "integrated.pca2.cca", 
                                 verbose = TRUE)

seurat.merged <- IntegrateLayers(object = seurat.merged, method = CCAIntegration, 
                                 orig.reduction = "harmony", new.reduction = "integrated.harmony.cca", 
                                 verbose = TRUE)

# re-join layers after integration
seurat.merged[["RNA"]] <- JoinLayers(seurat.merged[["RNA"]])

# Neighbors, Clusters and UMAP
cat("A. (5/6) => Neighbors, Clusters and UMAP", sep = "\n")
seurat.merged <- RunUMAP(seurat.merged, dims = 1:30, reduction = "integrated.pca.cca", reduction.name = "umap.int.pca.cca")
seurat.merged <- RunUMAP(seurat.merged, dims = 1:30, reduction = "integrated.pca2.cca", reduction.name = "umap.int.pca2.cca")
seurat.merged <- RunUMAP(seurat.merged, dims = 1:30, reduction = "integrated.harmony.cca", reduction.name = "umap.int.harmony.cca")

Thank you in advance!!

Julia

Sogand65 commented 3 days ago

Hi,

I have similar question as the above question. I have 48 samples after merging them I wanna integrate them in a single object, but I want also to correct the possible batch effect over 1- the orig.ident (single sample) and 2- over the experiment batches in library preparation ( there are 6 libraries) so what I am doing is according integration vignette, So my object is splitted over orig.ident and i have 48 *2 layers, my code is: seurat_object[["RNA"]] <- split(seurat_object[["RNA"]], f=seurat_object$orig.ident) seurat_object <- FindVariableFeatures(seurat_object,nfeatures = number_features) if (TCRBCR_ex == TRUE) { TCR.genes <- grep("^TR[ABGD][VJ]",rownames(seurat_object),value = T) BCR.genes <- grep("^IG[HKL]",rownames(seurat_object),value = T) gene.blacklist <- c(TCR.genes, BCR.genes) var.genes <- VariableFeatures(seurat_object) var.genes <- setdiff(var.genes,gene.blacklist) VariableFeatures(seurat_object) <- var.genes } seurat_object <- ScaleData(seurat_object) seurat_object <- RunPCA(object = seurat_object, verbose = FALSE,features = VariableFeatures(seurat_object),npcs=50) seurat_object <- IntegrateLayers( object = seurat_object, method = HarmonyIntegration, orig.reduction = "pca", new.reduction = "harmony", verbose = FALSE) seurat_object <- FindNeighbors(seurat_object, dims=1:dims,reduction = "harmony",k.param = k) seurat_object <- FindClusters(seurat_object, resolution = seq(0,1,0.1), random.seed=42) seurat_object <- FindClusters(object = seurat_object,resolution=res,random.seed=42,graph.name = 'RNA_snn') seurat_object <- RunUMAP(seurat_object,reduction = "harmony",seed.use = 42,dims=1:dims, a=0.5,b=1, verbose=FALSE) seurat_object <- JoinLayers(seurat_object)

My questions are: 1-as the splitting and integration is over orgin.ident , this will remove the batch effect over samples/orig.ident? 2-now can I do similar step by now splitting the integrated object over libraries (6 libraries) and simply repeating the steps to correct batches over libraries? 3-the final integrated object is corrected for both samples and batches? Is this a correct solution you think or am I doing over correcting?

I appreciate any feedback and suggestions.

Thanks, Sogand