satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.29k stars 915 forks source link

Run harmony by calling IntegrateLayers - how to specify the batch covariate in meta.data? #9382

Open linpei26 opened 3 weeks ago

linpei26 commented 3 weeks ago

Hello!

I find the tutorial at https://satijalab.org/seurat/articles/seurat5_integration and learn that we could call harmony by the function "IntegrateLayers". But in the example code (copied here below), I did not see any parameter to specify batch covariate.

obj <- IntegrateLayers(
  object = obj, method = HarmonyIntegration,
  orig.reduction = "pca", new.reduction = "harmony",
  verbose = FALSE
)

So, is it unnecessary to use such a para when applying IntegrateLayers, or am I missing anything? Thanks in advance. Best, P

adairama commented 3 weeks ago

The first half of the vignette talks about the data processing steps. The very first step is to split the unintegrated dataset by the batch covariate (it is "Method" in the vignette). Here is a minimal example inspired by the vignette:

## Step 1: Split by the batch covariate
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)

## Step 2: Preprocess each split
obj <- obj %>% 
   NormalizeData(verbose = FALSE) %>%
   FindVariableFeatures(verbose = FALSE)  %>%
   ScaleData(verbose = FALSE) %>%
   RunPCA(verbose = FALSE)

## Step 3: Data integration and generate UMAP
obj <- IntegrateLayers(obj, 
   method = HarmonyIntegration,
   orig.reduction = "pca", 
   new.reduction = "harmony",
   verbose = FALSE)

obj <- RunUMAP(obj, reduction = "harmony", dims = 1:30, reduction.name = "umap.harmony")

## Step 4: Optional cleanup
obj <- JoinLayers(obj)
obj[["RNA"]]$scale.data <- NULL
linpei26 commented 3 weeks ago

Thank you. I am now using the harmony package directly.

The code is:

obj_1 = merge(x=o1,y=o2)

obj_1 = NormalizeData(obj_1)
obj_1 = FindVariableFeatures(obj_1)
obj_1 = ScaleData(obj_1)

obj_1 = RunPCA(obj_1)

obj_1 = RunHarmony(obj_1, c("dataset","flowcell"))
obj_1[["RNA"]] = JoinLayers(obj_1[["RNA"]])

obj_1 = FindNeighbors(obj_1, dims = 1:20, reduction = "harmony")
obj_1 = FindClusters(obj_1)
obj_1 = RunUMAP(obj_1, dims = 1:20, reduction = "harmony")

Please feel free to me know if you have any suggestions. Thanks. P