I'm having much the same problem as here: https://github.com/welch-lab/liger/issues/277

I want to compare two samples in Seurat and correct for batch effect with LIGER. I have tried this two ways:

Method 1: use SCTransform on samples separately before using LIGER

D14A_NT_hs <- SCTransform(D14A_NT_hs, vars.to.regress = c("percent.mt", "nFeature_RNA", "percent.ribo"), verbose = FALSE)

D14D_DKKMO_hs <- SCTransform(D14D_DKKMO_hs, vars.to.regress = c("percent.mt", "nFeature_RNA", "percent.ribo"), verbose = FALSE)

make sure all genes are included downstream

g.1 <- rownames(x = D14A_NT_hs) g.2 <- rownames(x = D14D_DKKMO_hs) genes.use <- unique(c(g.1, g.2))

reference.list <- list(D14A_NT_hs, D14D_DKKMO_hs) integrate.anchors <- FindIntegrationAnchors(object.list = reference.list, dims = 1:40, anchor.features = 12000) integrated.data <- IntegrateData(anchorset = integrate.anchors, dims = 1:40, features.to.integrate = genes.use)

DefaultAssay(object = integrated.data) <- "integrated"

integrated.data <- ScaleData(integrated.data, split.by = "Condition", do.center = FALSE) integrated.data <- RunOptimizeALS(integrated.data, k = 20, lambda = 5, split.by = "orig.ident") integrated.data <- RunQuantileNorm(integrated.data, split.by = "orig.ident") integrated.data <- FindNeighbors(integrated.data, reduction = "iNMF", dims = 1:20) integrated.data <- FindClusters(integrated.data, resolution = 0.1) integrated.data <- RunUMAP(integrated.data, dims = 1:ncol(integrated.data[["iNMF"]]), reduction = "iNMF") DimPlot(integrated.data, group.by = c("ident", "orig.ident"), ncol = 2)

Method 2: follow SeuratWrappers LIGER tutorial exactly and do not perform SCTransform

D14A_NT_hs <- CreateSeuratObject(counts = D14A_NT_hs.data, project = "D14A_NT_hs", min.cells = 10, min.features = 150)

D14D_DKKMO_hs <- CreateSeuratObject(counts = D14D_DKKMO_hs.data, project = "D14D_DKKMO_hs", min.cells = 10, min.features = 150)

D14.combined <- merge(D14A_NT_hs, y = D14D_DKKMO_hs, add.cell.ids = c("D14A", "D14D"), project = "both_D14")

D14.combined <- NormalizeData(D14.combined) D14.combined <- FindVariableFeatures(D14.combined) D14.combined <- ScaleData(D14.combined, split.by = "orig.ident", do.center = FALSE) D14.combined <- RunOptimizeALS(D14.combined, k = 20, lambda = 5, split.by = "orig.ident") D14.combined <- RunQuantileNorm(D14.combined, split.by = "orig.ident")

D14.combined <- FindNeighbors(D14.combined, reduction = "iNMF", dims = 1:20) D14.combined <- FindClusters(D14.combined, resolution = 0.1)

Dimensional reduction and plotting

D14.combined <- RunUMAP(D14.combined, dims = 1:ncol(D14.combined[["iNMF"]]), reduction = "iNMF") DimPlot(D14.combined, group.by = c("ident", "orig.ident"), ncol = 2)

#############

Method #1 gives me results that look like batch correction has functioned correctly: all the clusters contain cells from both samples. Method #2 gives me results that look like batch correction has not been performed: two of the 5 clusters contain cells from only one of the samples...this is similar to what I see if I perform no batch correction at all.

Method #2 seems to follow the SeuratWrappers LIGER tutorial.

I did check scale.data, as recommended here: https://github.com/welch-lab/liger/issues/161 and it does have negative values. The advice to check that is 3 years old, so I don't know whether it is still a concern.

Any help regarding the use of SCTransform would be greatly appreciated. This is a set of very strange patient-derived xenograft samples, so unfortunately have no prior knowledge as to the proper clustering of the cells.

welch-lab / liger

Another request for help regarding SCTransform #282

Method 1: use SCTransform on samples separately before using LIGER

make sure all genes are included downstream

Method 2: follow SeuratWrappers LIGER tutorial exactly and do not perform SCTransform

Dimensional reduction and plotting