Open Lucas-Maciel opened 1 year ago
Hi @Lucas-Maciel,
Can you share some code on how you generate the heatmap?
Also can you share some UMAP which show before and after integration?
That will help us understand better the problem
Hi @pati-ni
Here is a bit more of the code after running harmony
dimensions <- 1:5
seu_harmony <- FindNeighbors(seu_harmony , dims = dimensions, reduction = "harmony")
seu_harmony <- RunUMAP(seu_harmony , dims=dimensions, reduction = "harmony")
seu_harmony <- FindClusters(seu_harmony , resolution = 0.05)
Markers <- FindAllMarkers(seu_harmony,only.pos = T,min.pct = 0.3,logfc.threshold = 0.3)
Markers %>%
group_by(cluster) %>%
top_n(n = 25, wt = avg_log2FC) -> top25
DoHeatmap(seu_harmony,features = top25$gene)+ theme(axis.text.y=element_text(size=6))
Before harmony After harmony
Samples represented by the colors blue and green are from the same time point and batch of sequencing.
I would make sure this seu_harmony <- FindClusters(seu_harmony , resolution = 0.05)
uses indeed the harmony embeddings.
Also, it seems that you are using only 5 dims. Have you experimented with using more PCs in your analysis?
From what I understand FindClusters performs graph-based clustering on the neighbor graph that is constructed with the FindNeighbors function (which I used the harmony reduction). There is no reduction argument available in the FindClusters.
I tried using 10 dimensions and the main clusters mostly remain the same, just change the shape of the UMAP.
I see. Thanks for clarifying. Looking back at the thread I think I understand better.
From my understanding, the issue isn't that the UMAPs are batchy, it is just that you observe the batch effects remain in the gene expression space and this is picked up by the FindAllMarkers()
. Unfortunately, that's expected because harmony does not touch your gene expression values but only transforms the PCs you provide.
If you want to mitigate this effect you could try using GLMs for gene expression data and include the batch as a covariate. FindAllMarkers
is too simplistic for this.
Hi,
First of all, thank you for the very nice method. I have used harmony in two different datasets in combination with SCT and, in both datasets, I have seen that when I plot the heatmap, you can distinguish the different samples in each cluster.
I'm using the following code
Am I doing something wrong here when combining both methods? Is there anything I can do to have a more homogeneous expression? It's important to say that these samples come from different time points, but it gets confusing if it's biological or technical
Thank you for the attention