jokergoo / ComplexHeatmap

Make Complex Heatmaps
https://jokergoo.github.io/ComplexHeatmap-reference/book/
Other
1.31k stars 227 forks source link

column_order : number of columns and representation. #527

Closed beetlejuice007 closed 4 years ago

beetlejuice007 commented 4 years ago

Hi Jokergoo

I have a question regarding column_order.

I constructed a heatmap that looks like this. image

I then used column_order(ht) to get order of columns in the four group. I got this List of 4 $ 3: int [1:151] 325 233 1000 443 418 1442 538 1289 1366 200 ... $ 4: int [1:433] 458 221 1320 793 190 1387 908 544 326 1239 ... $ 1: int [1:127] 1007 912 846 1406 841 781 898 1305 14 1038 ... $ 2: int [1:877] 1012 1558 1584 426 444 992 1107 1513 450 864 ...

Observation: According to column_order group two is the largest containing 877 columns but according to the figure group 1 appears to contain more columns. Also, according to column_order group 1 contain the least number of column (127), but according to figure it appears group III contain least number of column.

Question 1: Why does it appear like this ? Question 2: Does heat-map representation have equal width of columns in each group ?

Thanks Hemant

jokergoo commented 4 years ago

I think it was a bug. Can you try the newest version from GitHub?

beetlejuice007 commented 4 years ago

Hi, This didn't solve problem. Also there is something weird going on. I noticed every time i re-run column_order(ht) I get a different number of columns in the four groups.

jokergoo commented 4 years ago

OK, then can you attach the data and the code you used?

beetlejuice007 commented 4 years ago

I created a subset of my data. and repeated the method i followed.

heatmap: image

Input data: tmp.txt

code used:

>h_tmp <- Heatmap(tmp,  column_km =4, col = colorRamp2(c(0.10, 0.50, 0.90), c("navyblue", "white", "red")), 
                   cluster_columns = T, show_column_names = F, column_title_side = "bottom", 
                   cluster_rows = F, clustering_distance_rows = "euclidean", clustering_method_rows = "complete", 
                   row_dend_side = c("left"), row_dend_width = unit(10, "mm"),show_row_dend = TRUE, 
                   row_dend_reorder = TRUE, row_dend_gp = gpar(), row_title = "Samples", name = "MBeta", 
                   show_row_names = F)

I repeated the following command a few times to check if output is same :

> h_tmp_col_ord <- column_order(h_tmp)
> str(h_tmp_col_ord)
List of 4
 $ 2: int [1:16] 11 35 37 26 14 12 46 20 5 23 ...
 $ 1: int [1:24] 36 21 15 2 38 32 49 4 8 41 ...
 $ 4: int [1:7] 44 7 17 18 33 25 50
 $ 3: int [1:3] 9 1 43
> h_tmp_col_ord <- column_order(h_tmp)
> str(h_tmp_col_ord)
List of 4
 $ 2: int [1:10] 36 30 21 15 32 49 2 38 4 24
 $ 1: int [1:14] 8 41 29 31 6 27 45 28 39 47 ...
 $ 3: int [1:16] 11 35 37 26 14 12 46 20 5 23 ...
 $ 4: int [1:10] 44 7 17 18 33 25 9 1 43 50
> h_tmp_col_ord <- column_order(h_tmp)
> str(h_tmp_col_ord)
List of 4
 $ 2: int [1:10] 36 30 21 15 32 49 2 38 4 24
 $ 1: int [1:14] 8 41 29 31 6 27 45 28 39 47 ...
 $ 3: int [1:16] 11 35 37 26 14 12 46 20 5 23 ...
 $ 4: int [1:10] 44 7 17 18 33 25 9 1 43 50
> h_tmp_col_ord <- column_order(h_tmp)
> str(h_tmp_col_ord)
List of 4
 $ 2: int [1:10] 36 30 21 15 32 49 2 38 4 24
 $ 1: int [1:14] 8 41 29 31 6 27 45 28 39 47 ...
 $ 3: int [1:16] 11 35 37 26 14 12 46 20 5 23 ...
 $ 4: int [1:10] 44 7 17 18 33 25 9 1 43 50
> h_tmp_col_ord <- column_order(h_tmp)
> str(h_tmp_col_ord)
List of 4
 $ 1: int [1:23] 36 21 15 2 38 49 4 8 41 29 ...
 $ 2: int [1:8] 5 23 19 3 16 42 48 22
 $ 3: int [1:9] 11 35 37 26 14 12 46 20 32
 $ 4: int [1:10] 44 7 17 18 33 25 9 1 43 50

I think this is weird output should be same. Can you please look into it. Thanks for help.

> sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] circlize_0.4.10      ComplexHeatmap_2.5.3 ggfortify_0.4.10     dplyr_1.0.0          VennDiagram_1.6.20   futile.logger_1.4.3 
[7] ggplot2_3.3.1        readxl_1.3.1        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6         RColorBrewer_1.1-2   cellranger_1.1.0     pillar_1.4.4         compiler_4.0.1       formatR_1.7         
 [7] futile.options_1.0.1 tools_4.0.1          digest_0.6.25        clue_0.3-57          lifecycle_0.2.0      tibble_3.0.1        
[13] gtable_0.3.0         png_0.1-7            pkgconfig_2.0.3      rlang_0.4.6          rstudioapi_0.11      parallel_4.0.1      
[19] gridExtra_2.3        cluster_2.1.0        withr_2.2.0          stringr_1.4.0        S4Vectors_0.26.1     IRanges_2.22.2      
[25] GlobalOptions_0.1.2  generics_0.0.2       vctrs_0.3.1          stats4_4.0.1         tidyselect_1.1.0     glue_1.4.1          
[31] R6_2.4.1             GetoptLong_1.0.0     tidyr_1.1.0          purrr_0.3.4          farver_2.0.3         lambda.r_1.2.4      
[37] magrittr_1.5         BiocGenerics_0.34.0  scales_1.1.1         ellipsis_0.3.1       shape_1.4.4          colorspace_1.4-1    
[43] labeling_0.3         stringi_1.4.6        munsell_0.5.0        rjson_0.2.20         crayon_1.3.4     
jokergoo commented 4 years ago

Aha, I see, you should do as follows:

h_tmp <- draw(h_tmp)  # this is important
h_tmp_col_ord <- column_order(h_tmp)

The reason is Heatmap() function is only a constructor while it does not apply e.g. clustering. Everything is done only when draw() is executed. So this command h_tmp <- draw(h_tmp) actually performs the clustering and saves the clustering (or k-means clustering) in the h_tmp variable. Then later when you apply column_order, it simply extracts the clustering results that are already saved in h_tmp. If you do not update h_tmp by draw(), then every time when you apply column_order(), since h_tmp is not an initialized heatmap, it will re-make the heatmap and that is why you always have different clustering results.

See more details in https://jokergoo.github.io/ComplexHeatmap-reference/book/a-list-of-heatmaps.html#get-orders-and-dendrograms-from-a-list-of-heatmaps

I think I should better to add some warning messages if users send an uninitialized heatmap to column_order or other related functions.

beetlejuice007 commented 4 years ago

Yes this solved the problem. I think it does not makes sense to be able to run column_order() on uninitialized heatmap. Maybe you should block that with an error message ? Thanks for helping.