rnabioco / clustifyr

Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets
https://rnabioco.github.io/clustifyr/
MIT License
112 stars 14 forks source link

Colour palette has too few colours #383

Closed Dazcam closed 3 years ago

Dazcam commented 3 years ago

I'm having trouble with the plot_dims() function when trying to compare the output of plot_best_col() as described in the introductory vingette. My code is below, note the first line was required to circumvent an additional, unrelated error to the one I'm writing about now.

options(ggrepel.max.overlaps = Inf) # Need this or plot_best_call() throws error

  clustifyr_types <- plot_best_call(
    cor_mat = res_obj,          # matrix of correlation coefficients from clustifyr()
    metadata = seurat.obj@meta.data,   # meta.data table containing UMAP or tSNE data
    do_label = TRUE,        # should the feature label be shown on each cluster?
    do_legend = FALSE,      # should the legend be shown?
    cluster_col = paste0(SAMPLE, "_idents")
  ) +
    ggtitle("clustifyr cell types")

  # Compare clustifyr results with known cell identities
  known_types <- plot_dims(
    data = seurat.obj@meta.data,       # meta.data table containing UMAP or tSNE data
    feature = paste0(SAMPLE, "_idents"), # name of column in meta.data to color clusters by
    do_label = TRUE,        # should the feature label be shown on each cluster?
    do_legend = FALSE,      # should the legend be shown?
  ) +
    ggtitle("Known cell types")

The error that is thrown by plot_dims():

Warning messages:
1: In RColorBrewer::brewer.pal(n, pal) :
  n too large, allowed maximum for palette Paired is 12
Returning the palette you asked for with that many colors

2: Removed 617 rows containing missing values (geom_point). 

I'm assuming this is caused by the default colour palette (pretty_palette) having too few colours for my data, I have ~20 clusters in my custom reference and query datasets. I have tried to set the c_col parameter in plot_dims() such that it uses over 50 colours:

 c_cols = c("#9E0142", "#A90D44", "#B41947", "#BF2649", "#CA324C", "#D53E4E", 
             "#DB484C", "#E25249", "#E85B47", "#EE6544", "#F46F44", "#F67C4A", 
             "#F88A50", "#F99756", "#FBA45C", "#FDB163", "#FDBB6C", "#FDC574", 
             "#FDCF7D", "#FDD985", "#FEE28F", "#FEE899", "#FEEFA4", "#FEF5AF", 
             "#FEFBB9", "#FCFDBB", "#F7FBB3", "#F2F9AB", "#EDF7A3", "#E8F59B", 
             "#DEF299", "#D2ED9B", "#C6E89E", "#BAE3A0", "#AEDEA3", "#A1D9A4", 
             "#93D3A4", "#84CEA4", "#76C8A4", "#68C3A4", "#5DB8A8", "#52ACAD", 
             "#48A0B2", "#3D95B7", "#3389BC", "#3A7DB8", "#4371B2", "#4C66AD", 
             "#555AA7", "#5E4FA2")

But that didn't work. I could recode this manually but the example in the intro vignette is nice/handy and 20 or so clusters doesn't seem like a huge amount of clusters for these functions to process. Could you suggest a workaround/solution for this?

SessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RColorBrewer_1.1-2   rmarkdown_2.7        cowplot_1.1.1        ComplexHeatmap_2.6.2
[5] ggplot2_3.3.3        clustifyr_1.2.0      SeuratObject_4.0.0   Seurat_4.0.0        

loaded via a namespace (and not attached):
  [1] circlize_0.4.12             fastmatch_1.1-0             plyr_1.8.6                 
  [4] igraph_1.2.6                lazyeval_0.2.2              splines_4.0.3              
  [7] entropy_1.2.1               BiocParallel_1.24.1         listenv_0.8.0              
 [10] scattermore_0.7             GenomeInfoDb_1.26.2         digest_0.6.27              
 [13] htmltools_0.5.1.1           magick_2.7.1                rsconnect_0.8.16           
 [16] fansi_0.4.2                 magrittr_2.0.1              tensor_1.5                 
 [19] cluster_2.1.1               ROCR_1.0-11                 globals_0.14.0             
 [22] matrixStats_0.58.0          colorspace_2.0-0            ggrepel_0.9.1              
 [25] xfun_0.22                   dplyr_1.0.5                 crayon_1.4.1               
 [28] RCurl_1.98-1.2              jsonlite_1.7.2              spatstat_1.64-1            
 [31] spatstat.data_2.0-0         survival_3.2-7              zoo_1.8-9                  
 [34] glue_1.4.2                  polyclip_1.10-0             gtable_0.3.0               
 [37] zlibbioc_1.36.0             XVector_0.30.0              leiden_0.3.7               
 [40] GetoptLong_1.0.5            DelayedArray_0.16.2         future.apply_1.7.0         
 [43] shape_1.4.5                 SingleCellExperiment_1.12.0 BiocGenerics_0.36.0        
 [46] abind_1.4-5                 scales_1.1.1                DBI_1.1.1                  
 [49] miniUI_0.1.1.1              Rcpp_1.0.6                  viridisLite_0.3.0          
 [52] xtable_1.8-4                clue_0.3-58                 reticulate_1.18            
 [55] stats4_4.0.3                htmlwidgets_1.5.3           httr_1.4.2                 
 [58] fgsea_1.16.0                ellipsis_0.3.1              ica_1.0-2                  
 [61] pkgconfig_2.0.3             farver_2.1.0                sass_0.3.1                 
 [64] uwot_0.1.10                 deldir_0.2-10               utf8_1.2.1                 
 [67] tidyselect_1.1.0            labeling_0.4.2              rlang_0.4.10               
 [70] reshape2_1.4.4              later_1.1.0.1               munsell_0.5.0              
 [73] tools_4.0.3                 cli_2.3.1                   generics_0.1.0             
 [76] ggridges_0.5.3              evaluate_0.14               stringr_1.4.0              
 [79] fastmap_1.1.0               yaml_2.2.1                  goftest_1.2-2              
 [82] knitr_1.31                  fitdistrplus_1.1-3          purrr_0.3.4                
 [85] RANN_2.6.1                  pbapply_1.4-3               future_1.21.0              
 [88] nlme_3.1-152                mime_0.10                   compiler_4.0.3             
 [91] rstudioapi_0.13             plotly_4.9.3                png_0.1-7                  
 [94] spatstat.utils_2.1-0        tibble_3.1.0                bslib_0.2.4                
 [97] stringi_1.5.3               highr_0.8                   lattice_0.20-41            
[100] Matrix_1.3-2                vctrs_0.3.6                 pillar_1.5.1               
[103] lifecycle_1.0.0             BiocManager_1.30.10         jquerylib_0.1.3            
[106] lmtest_0.9-38               GlobalOptions_0.1.2         RcppAnnoy_0.0.18           
[109] data.table_1.14.0           bitops_1.0-6                irlba_2.3.3                
[112] httpuv_1.5.5                patchwork_1.1.1             GenomicRanges_1.42.0       
[115] R6_2.5.0                    promises_1.2.0.1            KernSmooth_2.23-18         
[118] gridExtra_2.3               IRanges_2.24.1              parallelly_1.24.0          
[121] codetools_0.2-18            MASS_7.3-53.1               assertthat_0.2.1           
[124] SummarizedExperiment_1.20.0 rjson_0.2.20                withr_2.4.1                
[127] sctransform_0.3.2           S4Vectors_0.28.1            GenomeInfoDbData_1.2.4     
[130] mgcv_1.8-34                 parallel_4.0.3              rpart_4.1-15               
[133] tidyr_1.1.3                 MatrixGenerics_1.2.1        Cairo_1.5-12.2             
[136] Rtsne_0.15                  Biobase_2.50.0              shiny_1.6.0                
[139] tinytex_0.29    
kriemo commented 3 years ago

Thanks for bringing this issue to our attention. As an immediate workaround I believe that using the d_cols parameter rather than the c_cols parameter will work for your use case. See reprex below.

@raysinensis It looks like plot_dims is treating character and factor vectors differently when applying color palettes.

library(clustifyr)
#> Warning: package 'clustifyr' was built under R version 4.0.3
library(scales)
set.seed(42)
random_groups <- sample(as.character(1:3), 
                        size = nrow(pbmc_meta), 
                        replace = T)

pbmc_meta$too_many_groups <- paste0(pbmc_meta$classified, "_", random_groups)
pbmc_meta$too_many_groups_fct <- factor(pbmc_meta$too_many_groups)

# using a character vector works
plot_dims(
  data = pbmc_meta,       # meta.data table containing UMAP or tSNE data
  feature = "too_many_groups", # name of column in meta.data to color clusters by
  do_legend = TRUE      # should the legend be shown?
)


# using a factor throws warning, and doesn't show colors
plot_dims(
  data = pbmc_meta,       # meta.data table containing UMAP or tSNE data
  feature = "too_many_groups_fct", # name of column in meta.data to color clusters by
  do_legend = TRUE      # should the legend be shown?
)
#> Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Paired is 12
#> Returning the palette you asked for with that many colors
#> Warning: Removed 1511 rows containing missing values (geom_point).


# generate discrete palette to match # of groups
discrete_pal <-  hue_pal()(length(levels(pbmc_meta$too_many_groups_fct)))

plot_dims(
  data = pbmc_meta,    
  feature = "too_many_groups_fct", 
  do_legend = TRUE,
  d_cols = discrete_pal
)

Created on 2021-04-14 by the reprex package (v0.3.0)

Dazcam commented 3 years ago

Ah yes. I was hoping it would be something trivial.

Many thanks for the quick response.