jokergoo / ComplexHeatmap

Make Complex Heatmaps
https://jokergoo.github.io/ComplexHeatmap-reference/book/
Other
1.32k stars 228 forks source link

split on columns? #30

Closed crazyhottommy closed 3 years ago

crazyhottommy commented 8 years ago

Hi,

I looked at the help page for Heatmap, it seems only supports split on rows, and there is a gap parameter for it. Is it possible to split on columns as well?

A more detailed question is: when I do a supervised-clustering, I want to first split the columns (samples) into say 3 pre-defined subgroups first, and then do clustering within each subgroup for columns and do a k-means for all rows. Is it possible?

Now, I am manually arrange the data matrix into three distinct groups, and do a K means with the rows, and cluster_column=FALSE.

Thanks, Ming

jokergoo commented 8 years ago

ComplexHeatmap only supports splitting heatmap by rows because you can simply split the matrix by columns and concatenate each submatrix afterwards.

For your situation, I think you should do k-means clustering on the complete matrix beforehand and assign to split argument later.

mat
mat1 = mat[, 1:3]
mat2 = mat[, 4:6]
mat3 = mat[, 7:9]

km = kmeans(mat, centers = 3)$cluster
row_order = hclust(mat)$order
Heamtap(mat1, row_order = row_order, cluster_rows = FALSE, split = cluster) +
    Heatmap(mat2) + 
    Heatmap(mat3)
crazyhottommy commented 8 years ago

Great, thanks for the tip!

Ming

crazyhottommy commented 8 years ago

Hi,

I realize that when combine multiple heatmap together, the dendrogram for each individual heatmap is not connected. I want to have something like gapmap in which I can split both columns and rows (by specifying the height of the tree cut).

I hope it does not add too much complexity of the code, otherwise, do not need to add these features. I do like ComplexHeatmap the best (for using flexibility and user support ) after trying various packages.

Thanks, Ming

jokergoo commented 8 years ago

Thanks for the comment! That is also what I considered before, but according to current design of the package, it is not easy to support this feature. I totally agree this would be very useful and I will try to implement it once I have enough time.

To show different groups of row clusters and column clusters, I think there is another way which is to color branches of dendrogram with different colors, and this is supported by ComplexHeatmap

image

crazyhottommy commented 8 years ago

Thanks Zuguang! ComplexHeatmap is now my go to package for drawing heatmaps. I know you are better than me with R. I want to share my 2 cents on heatmaps :) https://rpubs.com/crazyhottommy/a-tale-of-two-heatmap-functions

Ming

jokergoo commented 8 years ago

Thanks for sharing the post! I didn't know that the base heatmap() scales the matrix only for coloring! I remember when I first learned heatmap(), I was also puzzled by the color of the heatmap.

crazyhottommy commented 8 years ago

It is good that Heatmap does not do any scaling inside :)

crazyhottommy commented 7 years ago

Hope you have some time now developing the splitting by columns :)

CodeInTheSkies commented 7 years ago

I bumped on to this post, and I am a lover of ComplexHeatmap! I have the same situation described by the original poster above, i.e.,

"A more detailed question is: when I do a supervised-clustering, I want to first split the columns (samples) into say 3 pre-defined subgroups first, and then do clustering within each subgroup for columns and do a k-means for all rows. Is it possible?"

I went through the above answers, but I didn't clearly understand how to achieve clustering within the subgroups of columns, while having an overall clustering for the rows. Could anybody please explain?

I don't understand how the first answer fully achieves this. Specifically, the object "km" is computed, but then not used in the code after that?

Appreciate any responses!

Thanks.

jamesdalg commented 6 years ago

@jokergoo I used your method and it works well for small heatmaps, but if I want to have large heatmaps with many splits, it can literally take all day to plot (I'm on an x2680 CPU and I think it may take 24 hours). Is there another way to split by columns, if I might ask? Is there a way to use parallelization on ComplexHeatmap (use multiple cores to plot a single heatmap)?

jokergoo commented 6 years ago

@jamesdalg Since many people are requesting column splitting in heatmaps, I will put it with highest priority.

The slowest part when making heatmaps are clustering. Generally a heatmap visualization is kind like descriptive visualization that it basically aims to find patterns in sub-region in the heatmap. On the other hand, when you have very huge matrices, say millions of rows or millions of columns, if you plot it in a file or on the screen, the neighbouring rows or columns are actually merged due to the resolution of the file or the screen. So the way I always do is first to random sample from rows or columns (say ~ 5000) and the pattern for the random sampled heatmap is actually same as the complete heatmap.

jamesdalg commented 6 years ago

If you could, if there is a way to make the .combine option in foreach take a "HeatmapList" object, that might help things a lot... or if there was a way to convert a list of Heatmap objects to a HeatmapList, that would help too (if there was public constructor from a list object). I think having a split_columns parameter might be the best though (not that I can decide).

jamesdalg commented 6 years ago

Is there a cosmetic way to just highlight certain blocks within a heatmap? That's basically what I'm trying to do. The gaps are just there to visually set parts of the heatmap apart.

jokergoo commented 6 years ago

Currently you can use decorate_heatmap_body() function. E.g.

mat = matrix(rnorm(100), 10)

Heatmap(mat, name = "test")

decorate_heatmap_body("test", {
    grid.rect(0, 0, width = 0.4, height = 1, just = c("left", "bottom"), 
        gp = gpar(lwd = 2, col = "black", fill = "transparent"))
})

http://www.bioconductor.org/packages/devel/bioc/vignettes/ComplexHeatmap/inst/doc/s6.heatmap_decoration.html

zouw2 commented 6 years ago

As an alternative to split columns, is there a way to draw vertical reference lines? thanks wei

jokergoo commented 6 years ago

This can be done by decorate_heatmap_body() if you know where to put the vertical line.

mat = matrix(rnorm(100), 10)
Heatmap(mat, name = "foo")
decorate_heatmap_body("foo", {
    # assume columns are split after the 4th column (after reordering)
    grid.lines(c(4/10, 4/10), c(0, 1), gp = gpar(col = "red", lty = 2))
})
zouw2 commented 6 years ago

In stead of actual reference line, can we achieve the visual effect of split columns by widening the right (or left) border for a column of cells? I feel the reference lines may cover a few cells near the border? Not sure which is easier to implement.

To me, the reference lines would fairly close to what I want, but the following code only generates the reference line at slice 1. I tried to provide slide=1:6, but got the following error. thanks!

wei

Error in grid.Call.graphics(L_downvppath, name$path, name$name, strict) : Viewport 'foo_heatmap_body_6' was not found

`library(ComplexHeatmap) library(circlize)

set.seed(123) mat = cbind(rbind(matrix(rnorm(16, -1), 4), matrix(rnorm(32, 1), 8)), rbind(matrix(rnorm(24, 1), 4), matrix(rnorm(48, -1), 8)))

mat = mat[sample(nrow(mat), nrow(mat)), sample(ncol(mat), ncol(mat))] rownames(mat) = paste0("R", 1:12) colnames(mat) = paste0("C", 1:10)

Heatmap(mat, name = "foo", split = paste('long name', rep(1:6, each =2 )))

decorate_heatmap_body("foo", { grid.lines(c(4/10, 4/10), c(0, 1), gp = gpar(col = "green", lwd = 2)) grid.lines(c(7/10, 7/10), c(0, 1), gp = gpar(col = "green", lwd = 2)) } )`

image

gowthamee commented 5 years ago

Hi Question: can I use column_split without clustering the columns ?

I need to split columns in the heatmap but I am not clustering my columns, I want to retain the order in the original matrix. I tried the following as I have 8 columns and I want to split them into 2, the first slice containing the first 4 columns and the second split, containing the last 4. Below is the code `f_heat <- Heatmap(as.matrix(ZZ0[, c(2:9)]), col = inferno(100), border = TRUE,
rect_gp = gpar(col = "black", lty = 1, lwd = 0.01), name = "log2FC",

heatmap_height = unit(0.05, "cm") *nrow(ZZ0),

              cluster_columns = FALSE,
              cluster_rows = hclust(dist(ZZ0[, 2:9], method = "euclidean"), method = "ward.D"),
              show_row_dend = FALSE,
              show_row_names = FALSE,
              show_column_names = FALSE,
              #row_names_gp = gpar(fontsize = 4.5, fontface = "bold"),
              #column_names_side = NULL,
              #column_names_gp = gpar(fontsize= 10, fontface = "bold", 
                                    # col = c(rep("#440154FF", 4), rep("#440154FF", 4))),
              #column_names_rot = 360,
              show_heatmap_legend = TRUE,
              row_split = 18,
              row_title_gp = gpar(fontsize = 9, fontface = "bold"),
              row_title_rot = 0,
              row_gap = unit(0.5, "mm"),
              cluster_row_slices = FALSE,
              top_annotation = colu_anno,
              column_order = c(1,2,3,4,5,6,7,8),
              column_split = factor(rep(c("G", "F"), 4), levels = c("G", "F")),
              cluster_column_slices = FALSE,
              column_title_gp = gpar(fontsize = 9, fontface = "bold"), 
              column_gap = unit(0.5, "mm"))`

However, there is a column clustering happening and cannot retain the original order of the columns. The examples given at the below link in section 2.7 have been done on with clustering on columns. https://jokergoo.github.io/ComplexHeatmap-reference/book/a-single-heatmap.html#heatmap-split Any help is appreciated. Thanks !!!

Gowthamee

crazyhottommy commented 4 years ago

Hi Zuguang, I know the split_column is now implemented https://jokergoo.github.io/ComplexHeatmap-reference/book/a-single-heatmap.html#heatmap-split

Just wondering if there is an easier way to do it now

A more detailed question is: when I do a supervised-clustering, I want to first split the columns (samples) into say 3 pre-defined subgroups first, and then do clustering within each subgroup for columns and do a k-means for all rows.

I know I can have 3 sub matrix: mat1, mat2, mat3 cluster each sub-matrix on columns, concatenate the 3 matrices after clustering on columns. then split by columns using category variable and split by rows using k-means.

Is there an easier way to do it in Complexheatmap?

Thanks a lot for this amazing package!

jokergoo commented 4 years ago
  1. row splitting and column splitting are independent.
  2. If you want to do two-level column splitting, just assign column_split a two-column data frame. Hierarchical clustering is automatically applied in each column slice.

See the following examples:

m = matrix(rnorm(10*50), ncol = 50)

fa = sample(letters[1:4], 50, replace = TRUE)

# ha just found, column_km can be used together with column_split
Heatmap(m, column_km = 3, column_split = fa, row_split = 2)

image

To precisely control the order of column slices:

df = data.frame(
    km = kmeans(t(m), centers = 3)$cluster,
    fa = fa
)

df$km = factor(df$km, levels = c(1, 2, 3))
df$fa = factor(df$fa, levels = letters[1:4])

Heatmap(m, column_split = df, row_split = 2, cluster_column_slices = FALSE)

image

And maybe you can check this post to find out how to add nice annotations for the different split variables.

https://jokergoo.github.io/2020/07/06/block-annotation-over-several-slices/

crazyhottommy commented 4 years ago

Thanks so much Zuguang!!

saisaitian commented 3 years ago

image Here is my code ht1 <- Heatmap( plotdata, name = "expression",

col = col_runif,

column_split = le, border=T, cluster_columns = F, show_column_names = F, show_row_names = F, cluster_column_slices = FALSE, column_title_gp = gpar( fill = c(HRisk='red',LRisk='blue'), alpha = 0.7, fontsize = 18 ) )

How could set gap between column_title_gp and heatmap body?

jokergoo commented 3 years ago

@saisaitian I think this is something I will improve. Current design where the column titles are not vertically centered is to ensure that they are aligned to title from ggplot plot if they are put together.

Currently, you can do like:

ht_opt$TITLE_PADDING = unit(c(8.5, 8.5), "points")
Heatmap(...)
Kiliankleemann commented 3 years ago

ComplexHeatmap only supports splitting heatmap by rows because you can simply split the matrix by columns and concatenate each submatrix afterwards.

For your situation, I think you should do k-means clustering on the complete matrix beforehand and assign to split argument later.

mat
mat1 = mat[, 1:3]
mat2 = mat[, 4:6]
mat3 = mat[, 7:9]

km = kmeans(mat, centers = 3)$cluster
row_order = hclust(mat)$order
Heamtap(mat1, row_order = row_order, cluster_rows = FALSE, split = cluster) +
    Heatmap(mat2) + 
    Heatmap(mat3)

I get an error message saying object cluster not found. Also the row_order = hclust(mat)$order gave me the error: Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") : missing value where TRUE/FALSE needed

Kiliankleemann commented 3 years ago

I would like to split my heatmap columns in a custom way and not based on k-means. How can I use column_split to split my heatmap into colums group eg. 1-4, 5-6, 7-12 ?

jokergoo commented 3 years ago

Then you can do it in two steps:

  1. apply k-means to get 12 groups,
  2. create a new categorical variable which corresponds to your new grouping, e.g.:
fa[km %in% 1:4] = "group A"
fa[km %in% 5:6] = "group B"
fa[km %in% 7:13] = "group C"

Then assign fa to column_split.