samuel-marsh / scCustomize

R package with collection of functions created and/or curated to aid in the visualization and analysis of single-cell data using R.
https://samuel-marsh.github.io/scCustomize/
GNU General Public License v3.0
223 stars 23 forks source link

Clustered_DotPlot cannot add another layer to change x-axis ticks #27

Closed Ssl2Translocase closed 2 years ago

Ssl2Translocase commented 2 years ago

Hi,

I'm using Clustered_DotPlot to make clustered dotplot. My plot has gene_ID on x-axis, which I want to change to gene_symbol. I've tried this by using DotPlot_scCustom and it worked as follows:

topN  <-  c("ENSRNOG00000016928", "ENSRNOG00000069361", "ENSRNOG00000017589", "ENSRNOG00000003633", "ENSRNOG00000047225")

> head(geneTable)
              geneID         geneSymbol
1 ENSRNOG00000066169 ENSRNOG00000066169
2 ENSRNOG00000070168              Olr56
3 ENSRNOG00000070901               Irgq
4 ENSRNOG00000018029              Doc2g
5 ENSRNOG00000031391           Ceacam16
6 ENSRNOG00000050129     AABR07000137.1
pdf(paste0(outdir, "test1.pdf"), width = 20, height = 10)
p1 <- DotPlot_scCustom(scrna.combined, features = topN, x_lab_rotate = TRUE, colors_use = "blue") + scale_x_discrete(breaks=topN, labels=geneTable$geneSymbol[match(topN, geneTable$geneID)])
print(p1)
dev.off()

However, it doesn't work for Clustered_DotPlot (the label is not changed):

pdf(paste0(outdir, "test2.pdf"), width = 10, height = 15)
p1 <- Clustered_DotPlot(scrna.combined, features = topN, x_lab_rotate = F) + scale_y_discrete(breaks=topN, labels=geneTable$geneSymbol[match(topN, geneTable$geneID)])
print(p1)
dev.off()

Thank you for helping.

sessionInfo() R version 4.1.2 (2021-11-01) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] plyr_1.8.6 gridExtra_2.3 qs_0.25.3 scCustomize_0.7.0
[5] viridis_0.6.2 viridisLite_0.4.0 forcats_0.5.1 purrr_0.3.4
[9] readr_2.1.2 tidyr_1.2.0 tibble_3.1.6 tidyverse_1.3.1
[13] cowplot_1.1.1 FlexDotPlot_0.2.1 monocle3_1.0.0 SingleCellExperiment_1.16.0 [17] SummarizedExperiment_1.24.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 IRanges_2.28.0
[21] S4Vectors_0.32.3 MatrixGenerics_1.6.0 matrixStats_0.61.0 Biobase_2.54.0
[25] BiocGenerics_0.40.0 scry_1.6.0 SeuratWrappers_0.3.0 glmpca_0.2.0
[29] reticulate_1.24 SoupX_1.5.2 stringr_1.4.0 ggplot2_3.3.5
[33] patchwork_1.1.1 SeuratObject_4.0.4 Seurat_4.1.0 dplyr_1.0.8

loaded via a namespace (and not attached): [1] ggprism_1.0.3.9000 scattermore_0.8 R.methodsS3_1.8.1 ragg_1.2.2 bit64_4.0.5
[6] knitr_1.37 multcomp_1.4-18 irlba_2.3.5 sisal_0.48 DelayedArray_0.20.0
[11] R.utils_2.11.0 data.table_1.14.2 rpart_4.1.16 doParallel_1.0.17 RCurl_1.98-1.6
[16] generics_0.1.2 metap_1.8 ScaledMatrix_1.2.0 TH.data_1.1-0 RApiSerialize_0.1.0
[21] RANN_2.6.1 future_1.24.0 bit_4.0.4 tzdb_0.2.0 mutoss_0.1-12
[26] spatstat.data_2.1-2 xml2_1.3.3 lubridate_1.8.0 httpuv_1.6.5 assertthat_0.2.1
[31] xfun_0.30 hms_1.1.1 jquerylib_0.1.4 promises_1.2.0.1 fansi_1.0.2
[36] dendextend_1.15.2 dbplyr_2.1.1 readxl_1.3.1 tmvnsim_1.0-2 igraph_1.2.11
[41] DBI_1.1.2 htmlwidgets_1.5.4 spatstat.geom_2.3-2 paletteer_1.4.0 ellipsis_0.3.2
[46] RSpectra_0.16-0 ggpubr_0.4.0 backports_1.4.1 FactoMineR_2.4 prismatic_1.1.0
[51] grImport2_0.2-0 RcppParallel_5.1.5 deldir_1.0-6 vctrs_0.3.8 remotes_2.4.2
[56] ROCR_1.0-11 abind_1.4-5 withr_2.5.0 ggforce_0.3.3 sctransform_0.3.3
[61] dittoSeq_1.7.0 mnormt_2.0.2 goftest_1.2-3 cluster_2.1.2 lazyeval_0.2.2
[66] crayon_1.5.0 hdf5r_1.3.5 labeling_0.4.2 pkgconfig_2.0.3 qqconf_1.2.1
[71] tweenr_1.0.2 nlme_3.1-155 vipor_0.4.5 rlang_1.0.2 globals_0.14.0
[76] lifecycle_1.0.1 miniUI_0.1.1.1 sandwich_3.0-1 colourpicker_1.1.1 mathjaxr_1.6-0
[81] modelr_0.1.8 rsvd_1.0.5 cellranger_1.1.0 polyclip_1.10-0 lmtest_0.9-39
[86] Matrix_1.4-0 carData_3.0-5 boot_1.3-28 zoo_1.8-9 reprex_2.0.1
[91] base64enc_0.1-3 beeswarm_0.4.0 ggridges_0.5.3 GlobalOptions_0.1.2 pheatmap_1.0.12
[96] rjson_0.2.21 stringfish_0.15.5 png_0.1-7 bitops_1.0-7 shinydashboard_0.7.2
[101] R.oo_1.24.0 KernSmooth_2.23-20 shape_1.4.6 parallelly_1.30.0 spatstat.random_2.1-0 [106] jpeg_0.1-9 rstatix_0.7.0 ggsignif_0.6.3 beachmat_2.10.0 scales_1.1.1
[111] leaps_3.1 magrittr_2.0.2 ica_1.0-2 zlibbioc_1.40.0 compiler_4.1.2
[116] RColorBrewer_1.1-2 clue_0.3-60 plotrix_3.8-2 fitdistrplus_1.1-6 snakecase_0.11.0
[121] cli_3.2.0 XVector_0.34.0 listenv_0.8.0 pbapply_1.5-0 MASS_7.3-55
[126] mgcv_1.8-39 tidyselect_1.1.2 stringi_1.7.6 textshaping_0.3.6 BiocSingular_1.10.0
[131] ggrepel_0.9.1 grid_4.1.2 sass_0.4.0 tools_4.1.2 future.apply_1.8.1
[136] parallel_4.1.2 circlize_0.4.14 rstudioapi_0.13 foreach_1.5.2 janitor_2.1.0
[141] scatterplot3d_0.3-41 farver_2.1.0 Rtsne_0.15 digest_0.6.29 BiocManager_1.30.16
[146] shiny_1.7.1 Rcpp_1.0.8 car_3.0-12 broom_0.7.12 later_1.3.0
[151] RcppAnnoy_0.0.19 shinyWidgets_0.6.4 httr_1.4.2 ggdendro_0.1.23 ComplexHeatmap_2.10.0 [156] Rdpack_2.1.4 colorspace_2.0-3 rvest_1.0.2 XML_3.99-0.9 fs_1.5.2
[161] tensor_1.5 splines_4.1.2 uwot_0.1.11 sn_2.0.1 rematch2_2.1.2
[166] spatstat.utils_2.3-0 multtest_2.50.0 systemfonts_1.0.4 plotly_4.10.0 xtable_1.8-4
[171] jsonlite_1.8.0 flashClust_1.01-2 R6_2.5.1 TFisher_0.2.0 pillar_1.7.0
[176] htmltools_0.5.2 mime_0.12 glue_1.6.2 fastmap_1.1.0 DT_0.21
[181] BiocParallel_1.28.3 codetools_0.2-18 mvtnorm_1.1-3 utf8_1.2.2 lattice_0.20-45
[186] bslib_0.3.1 spatstat.sparse_2.1-0 numDeriv_2016.8-1.1 bsplus_0.1.3 ggbeeswarm_0.6.0
[191] leiden_0.3.9 colorway_0.2.0 limma_3.50.1 survival_3.3-1 munsell_0.5.0
[196] GetoptLong_1.0.5 R.matlab_3.6.2 GenomeInfoDbData_1.2.7 iterators_1.0.14 haven_2.4.3
[201] reshape2_1.4.4 gtable_0.3.0 rbibutils_2.2.7 spatstat.core_2.4-0

samuel-marsh commented 2 years ago

Hi,

Ok so the issues are two fold. First, is that Clustered_DotPlot returns 2 plots in list unless you tell it not to by setting plot_km_elbow = FALSE. However, the second error comes in type of plot. You'll note that even if you specify p1[[2]] + scale_y_discrete(breaks=topN, labels=geneTable$geneSymbol[match(topN, geneTable$geneID)]) it still fails. The reason is that DotPlot_scCustom results in ggplot2 based object (modified from DotPlot) whereas Clustered_DotPlot is ComplexHeatmap based.

There is a way to post-plot add new annotation to ComplexHeatmap but it's not compatible with how I return the plot in the function. Let me play around and see if I can modify things to allow you to change labels post plot natively vs. having to do in post-processing (Illustrator, PPT, etc).

Best, Sam

TingtingSsl2 commented 2 years ago

Thank you, Sam. if plot_km_elbow = T, it outputs 3 plots, plot 1 is dotplot, plot2 is elbow plot and plot 3 is supposed to be the clustered plot. Neither p1[[2]] + scale_y_discrete(breaks=topN, labels=geneTable$geneSymbol[match(topN, geneTable$geneID)]) or p1[[3]] + scale_y_discrete(breaks=topN, labels=geneTable$geneSymbol[match(topN, geneTable$geneID)]) worked.

samuel-marsh commented 2 years ago

Hi,

So it doesn't appear to be returning 3 plots for me:

Clustered_DotPlot_relabel(seurat_object = pbmc, features = c("IL1B", "CD14", "FCGR3A", "CD4", "CD8A", "CD3D"), plot_km_elbow = F, new_row_labels = c("IL1BA", "CD14A", "FCGR3AA", "CD4A", "CD8AA", "CD3DA"))

p <- Clustered_DotPlot(seurat_object = pbmc, features = c("IL1B", "CD14", "FCGR3A", "CD4", "CD8A", "CD3D")) 

p1 <- p[[1]]
p2 <- p[[2]]
p3 <- p[[3]]
Error in p[[3]] : subscript out of bounds

Regardless as I mentioned since it's not a ggplot object that returns we can't use the normal ggplot post themeing. I was working around for solution that I could include in the package but not sure I can but I do have solution you can implement with slightly modified function.

So basically with how I return the ComplexHeatmap it's not compatible to be themed afterward. I can modify function (will post below) so that it can be modified before returning the plot. The reason I don't want to implement this in the package directly though is that I'm worried about potential for incorrectly labeling the rows because there is no way to check that the new labels are in correct place and worry about the issue of rows being mislabeled as the wrong gene.

But in order to make your situation work you can use this modified function. Just make sure that all of your labels are in the correct order and there are the right number of new labels.

Clustered_DotPlot_relabel <- function(
  seurat_object,
  features,
  new_row_labels,
  colors_use_exp = viridis_plasma_dark_high,
  exp_color_min = -2,
  exp_color_middle = NULL,
  exp_color_max = 2,
  print_exp_quantiles = FALSE,
  colors_use_idents = NULL,
  x_lab_rotate = TRUE,
  k = 1,
  row_km_repeats = 1000,
  column_km_repeats = 1000,
  row_label_size = 8,
  raster = FALSE,
  plot_km_elbow = TRUE,
  elbow_kmax = NULL,
  assay = NULL,
  group.by = NULL,
  idents = NULL,
  show_parent_dend_line = TRUE,
  ggplot_default_colors = FALSE,
  seed = 123
) {
  # Check for packages
  ComplexHeatmap_check <- PackageCheck("ComplexHeatmap", error = FALSE)
  if (!ComplexHeatmap_check[1]) {
    stop(
      "Please install the ComplexHeatmap package to use Clustered_DotPlot",
      "\nThis can be accomplished with the following commands: ",
      "\n----------------------------------------",
      "\ninstall.packages('BiocManager')",
      "\nBiocManager::install('ComplexHeatmap')",
      "\n----------------------------------------",
      call. = FALSE
    )
  }

  # Check Seurat
  scCustomize:::Is_Seurat(seurat_object = seurat_object)

  # Check unique features
  features_unique <- unique(x = features)

  if (length(x = features_unique) != length(x = features)) {
    warning("Feature list contains duplicates, making unique.")
  }

  # Check exp min/max set correctly
  if (!exp_color_min < exp_color_max) {
    stop("The value for 'exp_color_min': ", exp_color_min, ", must be less than the value for 'exp_color_max': ", exp_color_max, ".")
  }

  # Get DotPlot data
  seurat_plot <- DotPlot(object = seurat_object, features = features_unique, assay = assay, group.by = group.by, scale = TRUE, idents = idents, col.min = NULL, col.max = NULL)

  data <- seurat_plot$data

  # Get expression data
  exp_mat <- data %>%
    select(-pct.exp, -avg.exp) %>%
    pivot_wider(names_from = id, values_from = avg.exp.scaled) %>%
    as.data.frame()

  row.names(x = exp_mat) <- exp_mat$features.plot

  # Check NAs if idents
  if (!is.null(x = idents)) {
    # Find NA features and print warning
    excluded_features <- exp_mat[rowSums(is.na(x = exp_mat)) > 0,] %>%
      rownames()
    warning("The following features were removed as there is no scaled expression present in subset (`idents`) of object provided: ", glue_collapse_scCustom(input_string = excluded_features, and = TRUE), ".")

    # Extract good features
    good_features <- rownames(exp_mat)

    # Remove rows with NAs
    exp_mat <- exp_mat %>%
      filter(features.plot %in% good_features)
  }

  exp_mat <- exp_mat[,-1] %>%
    as.matrix()

  # Get percent expressed data
  percent_mat <- data %>%
    select(-avg.exp, -avg.exp.scaled) %>%
    pivot_wider(names_from = id, values_from = pct.exp) %>%
    as.data.frame()

  row.names(x = percent_mat) <- percent_mat$features.plot

  # Subset dataframe for NAs if idents so that exp_mat and percent_mat match
  if (!is.null(x = idents)) {
    percent_mat <- percent_mat %>%
      filter(features.plot %in% good_features)
  }

  percent_mat <- percent_mat[,-1] %>%
    as.matrix()

  # print quantiles
  if (print_exp_quantiles) {
    message("Quantiles of gene expression data are:")
    print(quantile(exp_mat, c(0.1, 0.5, 0.9, 0.99)))
  }

  # set assay (if null set to active assay)
  assay <- assay %||% DefaultAssay(object = seurat_object)

  # Set default color palette based on number of levels being plotted
  if (is.null(x = group.by)) {
    group_by_length <- length(x = unique(x = seurat_object@active.ident))
  } else {
    group_by_length <- length(x = unique(x = seurat_object@meta.data[[group.by]]))
  }

  # Check colors use vs. ggplot2 color scale
  if (!is.null(x = colors_use_idents) && ggplot_default_colors) {
    stop("Cannot provide both custom palette to `colors_use` and specify `ggplot_default_colors = TRUE`.")
  }
  if (is.null(x = colors_use_idents)) {
    # set default plot colors
    if (is.null(x = colors_use_idents)) {
      colors_use_idents <- scCustomize_Palette(num_groups = group_by_length, ggplot_default_colors = ggplot_default_colors, color_seed = color_seed)
    }
  }

  # Pull Annotation and change colors to ComplexHeatmap compatible format
  Identity <- colnames(exp_mat)

  identity_colors <- DiscretePalette_scCustomize(num_colors = length(Identity), palette = "polychrome", shuffle_pal = F)
  names(identity_colors) <- Identity
  identity_colors_list <- list(Identity = identity_colors)

  # Create identity annotation
  column_ha <- ComplexHeatmap::HeatmapAnnotation(Identity = Identity,
                                                 col =  identity_colors_list,
                                                 na_col = "grey",
                                                 name = "Identity"
  )

  # Set middle of color scale if not specified
  if (is.null(x = exp_color_middle)) {
    exp_color_middle <- scCustomize:::Middle_Number(min = exp_color_min, max = exp_color_max)
  }

  palette_length <- length(colors_use_exp)
  palette_middle <- scCustomize:::Middle_Number(min = 0, max = palette_length)

  # Create palette
  col_fun = colorRamp2(c(exp_color_min, exp_color_middle, exp_color_max), colors_use_exp[c(1,palette_middle, palette_length)])

  # Calculate and plot Elbow
  if (plot_km_elbow) {
    # if elbow_kmax not NULL check it is usable
    if (!is.null(x = elbow_kmax) && elbow_kmax > (nrow(x = exp_mat) - 1)) {
      elbow_kmax <- nrow(x = exp_mat) - 1
      warning("The value provided for 'elbow_kmax' is too large.  Changing to (length(x = features)-1): ", elbow_kmax)
    }

    # if elbow_kmax is NULL set value based on input feature list
    if (is.null(x = elbow_kmax)) {
      # set to (length(x = features)-1) if less than 21 features OR to 20 if greater than 21 features
      if (nrow(x = exp_mat) > 21) {
        elbow_kmax <- 20
      } else {
        elbow_kmax <- nrow(x = exp_mat) - 1
      }
    }

    km_elbow_plot <- scCustomize:::kMeans_Elbow(data = exp_mat, k_max = elbow_kmax)
  }

  # prep heatmap
  if (raster) {
    layer_fun = function(j, i, x, y, w, h, fill) {
      grid.rect(x = x, y = y, width = w, height = h,
                gp = gpar(col = NA, fill = NA))
      grid.circle(x=x,y=y,r= sqrt(ComplexHeatmap::pindex(percent_mat, i, j)/100)  * unit(2, "mm"),
                  gp = gpar(fill = col_fun(ComplexHeatmap::pindex(exp_mat, i, j)), col = NA))
    }
  } else {
    cell_fun = function(j, i, x, y, w, h, fill) {
      grid.rect(x = x, y = y, width = w, height = h,
                gp = gpar(col = NA, fill = NA))
      grid.circle(x=x,y=y,r= sqrt(percent_mat[i, j]/100) * unit(2, "mm"),
                  gp = gpar(fill = col_fun(exp_mat[i, j]), col = NA))
    }
  }

  # Create legend for point size
  lgd_list = list(
    ComplexHeatmap::Legend(labels = c(0.25,0.5,0.75,1), title = "Percent Expressing",
                           graphics = list(
                             function(x, y, w, h) grid.circle(x = x, y = y, r = sqrt(0.25) * unit(2, "mm"),
                                                              gp = gpar(fill = "black")),
                             function(x, y, w, h) grid.circle(x = x, y = y, r = sqrt(0.5) * unit(2, "mm"),
                                                              gp = gpar(fill = "black")),
                             function(x, y, w, h) grid.circle(x = x, y = y, r = sqrt(0.75) * unit(2, "mm"),
                                                              gp = gpar(fill = "black")),
                             function(x, y, w, h) grid.circle(x = x, y = y, r = 1 * unit(2, "mm"),
                                                              gp = gpar(fill = "black")))
    )
  )

  # Set x label roration
  if (is.numeric(x = x_lab_rotate)) {
    x_lab_rotate <- x_lab_rotate
  } else if (isTRUE(x = x_lab_rotate)) {
    x_lab_rotate <- 45
  } else {
    x_lab_rotate <- 0
  }

  # Create Plot
  set.seed(seed = seed)
  if (raster) {
    cluster_dot_plot <- ComplexHeatmap::Heatmap(exp_mat,
                                                heatmap_legend_param=list(title="Expression"),
                                                col=col_fun,
                                                rect_gp = gpar(type = "none"),
                                                layer_fun = layer_fun,
                                                row_names_gp = gpar(fontsize = row_label_size),
                                                row_km = k,
                                                row_km_repeats = row_km_repeats,
                                                border = "black",
                                                top_annotation = column_ha,
                                                column_km_repeats = column_km_repeats,
                                                show_parent_dend_line = show_parent_dend_line,
                                                column_names_rot = x_lab_rotate)
  } else {
    cluster_dot_plot <- ComplexHeatmap::Heatmap(exp_mat,
                                                heatmap_legend_param=list(title="Expression"),
                                                col=col_fun,
                                                rect_gp = gpar(type = "none"),
                                                cell_fun = cell_fun,
                                                row_names_gp = gpar(fontsize = row_label_size),
                                                row_km = k,
                                                row_km_repeats = row_km_repeats,
                                                border = "black",
                                                top_annotation = column_ha,
                                                column_km_repeats = column_km_repeats,
                                                show_parent_dend_line = show_parent_dend_line,
                                                column_names_rot = x_lab_rotate)
  }

  # Add pt.size legend & return plots
  if (plot_km_elbow) {
    return(list(km_elbow_plot, ComplexHeatmap::draw(cluster_dot_plot, annotation_legend_list = lgd_list)))
  }
  return(ComplexHeatmap::draw(cluster_dot_plot + rowAnnotation(rn= anno_text(new_row_labels)), annotation_legend_list = lgd_list))
}

Here is example of how to use function:

library(tidyverse)
library(Seurat)
library(scCustomize)
library(circlize)
library(ComplexHeatmap)

pbmc <- pbmc3k.SeuratData::pbmc3k.final
pbmc <- UpdateSeuratObject(pbmc)

Clustered_DotPlot_relabel(seurat_object = pbmc, features = c("IL1B", "CD14", "FCGR3A", "CD4", "CD8A", "CD3D"), plot_km_elbow = F, new_row_labels = c("IL1BA", "CD14A", "FCGR3AA", "CD4A", "CD8AA", "CD3DA"))

image

If you still have trouble let me know and I can reopen the issue.

Best, Sam

TingtingSsl2 commented 2 years ago

Hi, Samuel,

It worked. Thanks a lot for taking the time to work on it.

Apologies for the delay.

Bests,

Tingting