immunogenomics / harmony

Fast, sensitive and accurate integration of single-cell data with Harmony
https://portals.broadinstitute.org/harmony/
Other
526 stars 100 forks source link

Batch effect seen on Heatmap #216

Open Lucas-Maciel opened 1 year ago

Lucas-Maciel commented 1 year ago

Hi,

First of all, thank you for the very nice method. I have used harmony in two different datasets in combination with SCT and, in both datasets, I have seen that when I plot the heatmap, you can distinguish the different samples in each cluster.

I'm using the following code

merged_seu <- merge(seurat_list$samp1,y = c(seurat_list$samp2,seurat_list$samp3))
seu_harmony <- SCTransform(merged_seu, verbose = TRUE, vars.to.regress = c("percent.mt", 'S.Score', 'G2M.Score'))
seu_harmony <- RunPCA(seu_harmony , verbose = TRUE)
seu_harmony <- RunHarmony(seu_harmony, group.by.vars = "orig.ident", assay.use="SCT")

Am I doing something wrong here when combining both methods? Is there anything I can do to have a more homogeneous expression? It's important to say that these samples come from different time points, but it gets confusing if it's biological or technical

Screenshot 2023-10-27 at 11 17 53

Thank you for the attention

pati-ni commented 12 months ago

Hi @Lucas-Maciel,

Can you share some code on how you generate the heatmap?

Also can you share some UMAP which show before and after integration?

That will help us understand better the problem

Lucas-Maciel commented 11 months ago

Hi @pati-ni

Here is a bit more of the code after running harmony

dimensions <- 1:5
seu_harmony  <- FindNeighbors(seu_harmony , dims = dimensions, reduction = "harmony")
seu_harmony  <- RunUMAP(seu_harmony , dims=dimensions, reduction = "harmony")
seu_harmony  <- FindClusters(seu_harmony , resolution = 0.05)
Markers <- FindAllMarkers(seu_harmony,only.pos = T,min.pct = 0.3,logfc.threshold = 0.3) 

Markers %>%
  group_by(cluster) %>%
  top_n(n = 25, wt = avg_log2FC) -> top25

DoHeatmap(seu_harmony,features = top25$gene)+ theme(axis.text.y=element_text(size=6))

Before harmony Screenshot 2023-11-09 at 11 07 11 After harmony Screenshot 2023-11-09 at 11 06 52

Samples represented by the colors blue and green are from the same time point and batch of sequencing.

pati-ni commented 11 months ago

I would make sure this seu_harmony <- FindClusters(seu_harmony , resolution = 0.05) uses indeed the harmony embeddings.

Also, it seems that you are using only 5 dims. Have you experimented with using more PCs in your analysis?

Lucas-Maciel commented 11 months ago

From what I understand FindClusters performs graph-based clustering on the neighbor graph that is constructed with the FindNeighbors function (which I used the harmony reduction). There is no reduction argument available in the FindClusters.

I tried using 10 dimensions and the main clusters mostly remain the same, just change the shape of the UMAP.

pati-ni commented 11 months ago

I see. Thanks for clarifying. Looking back at the thread I think I understand better.

From my understanding, the issue isn't that the UMAPs are batchy, it is just that you observe the batch effects remain in the gene expression space and this is picked up by the FindAllMarkers(). Unfortunately, that's expected because harmony does not touch your gene expression values but only transforms the PCs you provide.

If you want to mitigate this effect you could try using GLMs for gene expression data and include the batch as a covariate. FindAllMarkers is too simplistic for this.