satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.24k stars 902 forks source link

How to integrate multiple dataset within each batch effect is diminished? #3795

Closed lovebaboon1989 closed 3 years ago

lovebaboon1989 commented 3 years ago

Hi there, I am analyzing mouse RGC sc-RNA seq dataset and want to analyze the transcriptive program after nerve crush. Let's say we have two datasets, one is naive composed of 4 batches, the other one is 4d dpi after injury composed of 4 batches. My goal is to diminish the batch effect just within the naive dataset, and batch effect within the 4d dpi dataset, and simply merge these 2 datasets together to analyze the GE change from naive to 4d after injury, is it possible to realize this goal? Thanks!

I tried to write code like this: ###################################################

for naive dataset, remove the batch effect:

dataMatrix = read.csv("D:/Rfile/2019neuron/RGC_Atlas.csv", header = T) rownames(dataMatrix) = dataMatrix[,1] dataMatrix = dataMatrix[,-1] atlas <- CreateSeuratObject(counts = dataMatrix, project = "atlas", min.cells = 3, min.features = 200) pancreas.list <- SplitObject(atlas, split.by = "orig.ident") for (i in 1:length(pancreas.list)) { pancreas.list[[i]] <- NormalizeData(pancreas.list[[i]], verbose = FALSE) pancreas.list[[i]] <- FindVariableFeatures(pancreas.list[[i]], selection.method = "vst", nfeatures = 2000, verbose = FALSE) } pancreas.anchors <- FindIntegrationAnchors(object.list = pancreas.list, dims = 1:30) pancreas.integrated <- IntegrateData(anchorset = pancreas.anchors, dims = 1:30)

for the 4d dpi after injury dataset, remove the batch effect:

dataMatrix = read.csv("D:/Rfile/2/4d/4dONC_new.csv", header = T) rownames(dataMatrix) = dataMatrix[,1] dataMatrix = dataMatrix[,-1] atlas2 <- CreateSeuratObject(counts = dataMatrix, project = "atlas", min.cells = 3, min.features = 200) pancreas.list <- SplitObject(atlas2, split.by = "orig.ident") for (i in 1:length(pancreas.list)) { pancreas.list[[i]] <- NormalizeData(pancreas.list[[i]], verbose = FALSE) pancreas.list[[i]] <- FindVariableFeatures(pancreas.list[[i]], selection.method = "vst", nfeatures = 2000, verbose = FALSE) } pancreas2.anchors <- FindIntegrationAnchors(object.list = pancreas.list, dims = 1:30) pancreas2.integrated <- IntegrateData(anchorset = pancreas2.anchors, dims = 1:30)

simply merge these two datasets and perform downstream analysis, but unfortunately there is ERROR!

merged<-merge(pancreas.integrated,pancreas2.integrated) DefaultAssay(merged) <- "integrated" merged <- NormalizeData(merged, normalization.method = "LogNormalize", scale.factor = 10000) merged <- FindVariableFeatures(merged, selection.method = "vst", nfeatures = 2000)

###################################################

Note: the two functions: NormalizeData and FindVariableFeatures report an ERROR, so that downstream PCA and clustering process cannot be performed....

torkencz commented 3 years ago

Hi, can you show what the error is? Also are the batches that are injured the same as the uninjured controls?

lovebaboon1989 commented 3 years ago

Hi, can you show what the error is? Also are the batches that are injured the same as the uninjured controls?

Hi, when I apply the FindVariableFeatures function, the error message is: Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : invalid 'x'

There are 3 batches in unjured group, which I eliminate the batch effect. And there are 4 batches in injured group, which I also eliminate the batch effect. But I cannot combine them and perform the following procedures.

torkencz commented 3 years ago

I would refer you to issue #2387. https://github.com/satijalab/seurat/issues/2387 See if that helps at all. You might want to be careful about having different batches between your treated and untreated runs. If the issue persists go ahead and reopen.

lovebaboon1989 commented 3 years ago

I would refer you to issue #2387.

2387

See if that helps at all. You might want to be careful about having different batches between your treated and untreated runs. If the issue persists go ahead and reopen.

Hi, thanks for the reply, I have talked with someone and got told this error occurs because the Normalization function does not work properly, he suggested me output the array first and reload the array, and rerun the Normalization function and following, I would catch up later if this work or not.