immunogenomics / harmony

Fast, sensitive and accurate integration of single-cell data with Harmony
https://portals.broadinstitute.org/harmony/
Other
528 stars 99 forks source link

The error of harmony after subsetting data #256

Open Carrey14 opened 2 months ago

Carrey14 commented 2 months ago

Hello, Thanks for developing an excellent tool for batch correction. When I used Harmony to correct batch between two datasets, I found that Harmony perfectly corrected the batch effects in the overall cell population. 大群1

However, when I extracted a small subset, such as the T cell population, and re-ran all the steps from scaling to Harmony on this subset, I observed that while samples from a single dataset integrated well, the T cells from the two datasets showed clear batch effects, resulting in two distinct T cell clusters corresponding to the original datasets. Why is this happening?How to solve this problem? T2

Samples from all belong to one data set, and II belongs to another. Thanks!

pati-ni commented 2 weeks ago

Can you provide the steps for the analysis? Also, can you provide where are the subsetted cells of the second UMAP in the first one?

Carrey14 commented 2 days ago

Can you provide the steps for the analysis? Also, can you provide where are the subsetted cells of the second UMAP in the first one?

I'm sorry for the late reply. Here is the code I analyzed and the cluster in the red circle in the figure is the T-cell cluster I extracted. `HCC_harmony <- NormalizeData(HCC_all) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA(npcs = 100,verbose=FALSE)

system.time({HCC_harmony2 <- RunHarmony(HCC_harmony, group.by.vars = "orig.ident")}) pdf("el.pdf", width = 10, height = 7) ElbowPlot(HCC_harmony2, ndims = 100) dev.off()

pc.num=1:39 HCC_harmony3 <- FindNeighbors(HCC_harmony2, reduction = "harmony", dims = pc.num) %>% FindClusters(resolution = 0.4) HCC_harmony4 <- RunUMAP(HCC_harmony3, reduction = "harmony", dims = pc.num) HCC_harmony5 <- RunTSNE(HCC_harmony4, reduction = "harmony", dims = pc.num)

I have completed the cell annotation and added the groups "ALL" and "II" according to the source of the data set.

T_cell <- subset(HCC_harmony5, ident= "T cells") sce = CreateSeuratObject(counts = T_cell@assays$RNA@counts, meta.data = T_cell@meta.data) names(sce@reductions)

NULL

T_cell2 <- NormalizeData(sce, normalization.method = "LogNormalize", scale.factor = 1e4)

GetAssay(T_cell2,assay = "RNA")

T_cell2 <- FindVariableFeatures(T_cell2, selection.method = "vst", nfeatures = 2000) T_cell2 <- ScaleData(T_cell2) T_cell2 <- RunPCA(object = T_cell2,npcs = 50,verbose=FALSE) system.time({T_cell_harmony <- RunHarmony(T_cell2, group.by.vars = "orig.ident", project.dim = F)}) dims = 1:15 T_cell_harmony2 <- FindNeighbors(T_cell_harmony, reduction = "harmony", dims = dims) T_cell_harmony2 <- FindClusters(T_cell_harmony2, resolution = 0.8) table(T_cell_harmony2@meta.data$seurat_clusters)

T_cell_harmony3 <- RunUMAP(T_cell_harmony2, dims = dims, reduction = "harmony") T_cell_harmony3 <- RunTSNE(T_cell_harmony3, dims = dims, reduction = "harmony")` 384751002-ade22805-6b45-427d-adee-4516cb839cb8