jokergoo / ComplexHeatmap

Make Complex Heatmaps
https://jokergoo.github.io/ComplexHeatmap-reference/book/
Other
1.29k stars 225 forks source link

selecting top n from clusters in heatmap - visualize top n #960

Open pdhrati02 opened 2 years ago

pdhrati02 commented 2 years ago

Hi all, I made a heatmap using complex heatmap and it is clustered by rows and column. There being several row names the heatmap is dense and I wish to only keep say 10 rownames from top and 10 from bottom. Such that the clustering of those retained will remain as original and only 20 will be visualized in all.

Is it possible to do so?

My code for heatmap is very basic:

p1 <- as.ggplot(Heatmap(heat_matrix, column_title = "heatmap", column_title_gp = gpar(fontsize = 6), column_names_gp = grid::gpar(fontsize = 6), column_names_rot = 0, column_dend_height = unit(0.3, "cm"), show_heatmap_legend = F, row_names_gp = grid::gpar(fontsize = 5)))

Any help would be appreciated, Thank you DP

jokergoo commented 2 years ago

Check this link: https://jokergoo.github.io/ComplexHeatmap-reference/book/heatmap-annotations.html#mark-annotation

pdhrati02 commented 2 years ago

Check this link: https://jokergoo.github.io/ComplexHeatmap-reference/book/heatmap-annotations.html#mark-annotation

Hi @jokergoo, Thank you for pointing this out, however it doesn't wouldn't work well if rows are clustered. In my plot I have some associations and clustering shows most and least associated at both ends. And I just want to highlight those most and least associated. In the example in link, it used cluster rows false and so the indexes don't make sense if it is then set to TRUE. And thus to use it manual indexing will be needed, but thats complicated as the rows are dense.

Can you help me with this? Thank you Kind regards DP

jokergoo commented 2 years ago

If you can also give me the data and your code, then I can help you with it.

pdhrati02 commented 2 years ago

Hi @jokergoo, Sure thing. Please find attached a small example dataset. This is a subset of the original one. eg.txt

Aim is to highlight top 3 and bottom 3 after clustering.

comp_heat_spec <- read.table("eg.txt", sep = "\t", row.names = 1, header = T)

comp_mat_spec <- as.matrix(comp_heat_spec)

The code I used is something like this: ha = rowAnnotation(foo = anno_mark(at = c(1:3, 29:31), labels = list("bact1", "bact2", "bact3", "bact4", "bact5", "bact6"))) Heatmap(comp_mat_spec, name = "mat", cluster_rows = T, right_annotation = ha, row_names_side = "left", row_names_gp = gpar(fontsize = 4))

Please do let me know if you need any more details, I can email you the complete dataset if needed.

Thank you Kind regards DP

jokergoo commented 2 years ago

You need to get the index of the top k and bottom k rows after the clustering.

I think you need to

  1. first generate the clustering for rows
  2. get the top k and bottom k from the clustering object
  3. assign the clustering object to cluster_rows and use anno_mark() to mark these top 2k rows.

I will give you an example:

m = matrix(rnorm(1000*10), nrow = 1000)
rownames(m) = paste0("r", 1:1000)

dend = as.dendrogram(hclust(dist(m)))

od = order.dendrogram(dend)

selected = od[c(1:10, 991:1000)]

Heatmap(m, cluster_rows = dend, row_dend_reorder = FALSE) +
  rowAnnotation(foo = anno_mark(at = selected, labels = rownames(m)[selected]))

Here setting row_dend_reorder is not necessary. I just want to double make sure dend is not reordered.

image
pdhrati02 commented 2 years ago

Hi @jokergoo , Thank you for your quick response, however there is still one issue that prevails. So basically I want this:

image

And then to just highlight the top and bottom 10.

In the example code from you, the rows are still not ordered. And when I try your example code:

dend = as.dendrogram(hclust(dist(comp_mat_spec)))

od = order.dendrogram(dend)

selected = od[c(1:10, 60:70)]

Heatmap(comp_mat_spec, cluster_rows = dend) + rowAnnotation(foo = anno_mark(at = selected, labels = rownames(comp_mat_spec)[selected]))

I get this: image

I am not sure, if I have caused some confusion. My apologies for that.

I hope this example images clear out the confusion. Apologies once again.

jokergoo commented 2 years ago

Are you using Rstudio? Did you see any message from the R terminal? How about directly saving into a pdf file?

pdhrati02 commented 2 years ago

I did get this warning message: It seems you are using RStudio IDE. anno_mark() needs to work with the physical size of the graphics device. It only generates correct plot in the figure panel, while in the zoomed plot (by clicking the icon 'Zoom') or in the exported plot (by clicking the icon 'Export'), the connection to heatmap rows/columns might be wrong. You can directly use e.g. pdf() to save the plot into a file.

Use ht_opt$message = FALSE to turn off this message.

Saving with pdf also did not work, but that is not my main concern. My issue is that the rows are not ordered as you can see in the second image. All blue bands together and red bands together (like first image) is not happening in second. And thus the selection made is also making very little sense from what I want.

Apologies once again for all the trouble.

jokergoo commented 2 years ago

Then add one more command

dend = as.dendrogram(hclust(dist(comp_mat_spec)))
dend = reorder(dend, wts = rowMeans(comp_mat_spec))