jokergoo / ComplexHeatmap

Make Complex Heatmaps
https://jokergoo.github.io/ComplexHeatmap-reference/book/
Other
1.25k stars 220 forks source link

Heatmap overrides precalculated row dendrogram object #1186

Open eggrandio opened 1 month ago

eggrandio commented 1 month ago

I want to plot the expression of a group of genes (rows) in different conditions (columns), but I want to cluster them based on their sequence similarity, not on their expression values. When I apply the precalculated dendrogram, the heatamap is reordered even if I specify not to reorder rows (in fact, a new row dendrogram seems to be calculated).

Here is the source_data.

*There are two genes without expression values, but removing them makes no difference regarding this issue.

First, I am precalculating a dendrogram based on gene sequence similarity:

library(tidyverse)
library(msa)
library(seqinr)
library(ComplexHeatmap)
library(dendextend)

cbts_prots <- readRDS("cbts_prots.RDS")
gene_dendro <- msaMuscle(cbts_prots, type = "protein") %>% 
  msaConvert(type="seqinr::alignment") %>%
  dist.alignment("identity") %>%
  hclust() %>% 
  as.dendrogram()

gene_dendro %>% rev() %>% plot(horiz=T)

image

Then I want to apply it to the expression values and make a heatmap:

input_matrix <- readRDS("input_matrix.RDS")
Heatmap(
  input_matrix,
  name = "Expression",
  cluster_rows = gene_dendro,
  row_dend_reorder = FALSE,
  cluster_columns = FALSE)

image

The expected output would be something like this, preserving the dendrogram row order (I would like to show the row dendrogram):

Heatmap(
  input_matrix[gene_dendro %>% labels,],
  name = "Expression",
  row_dend_reorder = FALSE,
  cluster_rows = FALSE,
  cluster_columns = FALSE)

image

sessionInfo()
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22621)

Matrix products: default

(...)

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] circlize_0.4.16                     dendextend_1.17.1                   ComplexHeatmap_2.21.0              
 [4] edgeR_4.3.4                         limma_3.61.1                        ape_5.8                            
 [7] seqinr_4.2-36                       msa_1.37.0                          data.table_1.15.4                  
[10] rentrez_1.2.3                       rBLAST_1.1.1                        R.utils_2.12.3                     
[13] R.oo_1.26.0                         R.methodsS3_1.8.2                   Rsamtools_2.21.0                   
[16] BSgenome.NtabacumSR1.wang2024_1.0.0 BSgenome_1.73.0                     rtracklayer_1.65.0                 
[19] BiocIO_1.15.0                       plyranges_1.25.0                    GenomicRanges_1.57.0               
[22] Biostrings_2.73.0                   GenomeInfoDb_1.41.1                 XVector_0.45.0                     
[25] IRanges_2.39.0                      S4Vectors_0.43.0                    BiocGenerics_0.51.0                
[28] lubridate_1.9.3                     forcats_1.0.0                       stringr_1.5.1                      
[31] dplyr_1.1.4                         purrr_1.0.2                         readr_2.1.5                        
[34] tidyr_1.3.1                         tibble_3.2.1                        ggplot2_3.5.1                      
[37] tidyverse_2.0.0                    

loaded via a namespace (and not attached):
 [1] bitops_1.0-7                gridExtra_2.3               rlang_1.1.3                 magrittr_2.0.3             
 [5] clue_0.3-65                 GetoptLong_1.0.5            ade4_1.7-22                 matrixStats_1.3.0          
 [9] compiler_4.4.0              png_0.1-8                   vctrs_0.6.5                 shape_1.4.6.1              
[13] pkgconfig_2.0.3             crayon_1.5.2                fastmap_1.2.0               utf8_1.2.4                 
[17] rmarkdown_2.27              tzdb_0.4.0                  UCSC.utils_1.1.0            xfun_0.44                  
[21] zlibbioc_1.51.0             cachem_1.1.0                jsonlite_1.8.8              DelayedArray_0.31.1        
[25] BiocParallel_1.39.0         cluster_2.1.6               parallel_4.4.0              R6_2.5.1                   
[29] RColorBrewer_1.1-3          bslib_0.7.0                 stringi_1.8.4               jquerylib_0.1.4            
[33] iterators_1.0.14            Rcpp_1.0.12                 SummarizedExperiment_1.35.0 knitr_1.46                 
[37] Matrix_1.7-0                timechange_0.3.0            tidyselect_1.2.1            viridis_0.6.5              
[41] rstudioapi_0.16.0           abind_1.4-5                 yaml_2.3.8                  doParallel_1.0.17          
[45] codetools_0.2-20            curl_5.2.1                  lattice_0.22-6              Biobase_2.65.0             
[49] withr_3.0.0                 evaluate_0.23               pillar_1.9.0                MatrixGenerics_1.17.0      
[53] foreach_1.5.2               generics_0.1.3              RCurl_1.98-1.14             hms_1.1.3                  
[57] munsell_0.5.1               scales_1.3.0                glue_1.7.0                  tools_4.4.0                
[61] locfit_1.5-9.9              GenomicAlignments_1.41.0    XML_3.99-0.16.1             Cairo_1.6-2                
[65] colorspace_2.1-0            nlme_3.1-164                GenomeInfoDbData_1.2.12     restfulr_0.0.15            
[69] cli_3.6.2                   fansi_1.0.6                 viridisLite_0.4.2           S4Arrays_1.5.1             
[73] gtable_0.3.5                sass_0.4.9                  digest_0.6.35               SparseArray_1.5.7          
[77] rjson_0.2.21                htmltools_0.5.8.1           lifecycle_1.0.4             httr_1.4.7                 
[81] GlobalOptions_0.1.2         statmod_1.5.0               MASS_7.3-60.2              
eggrandio commented 2 weeks ago

@jokergoo I have tried to do it again, making the dendrogram from scratch (in case there is some hidden value in the MSA dendrogram) but I still get the same issue. I think I have done this in the past and there was no issue with using precalculated dendrograms

test_dendro <- data.frame("genes" = labels(gene_dendro),
                          "rand_val" = c(sample(1:100, length(labels(gene_dendro)), replace = TRUE))) %>% 
  column_to_rownames("genes") %>% 
  dist() %>% 
  hclust() %>% 
  as.dendrogram()

par(mar=c(5, 4, 4, 10) + 0.1)
test_dendro %>% rev() %>% plot(horiz=T)

image

input_matrix <- readRDS("input_matrix.RDS")
Heatmap(
  input_matrix,
  name = "Expression",
  cluster_rows = test_dendro,
  row_dend_reorder = FALSE,
  cluster_columns = FALSE)

image