tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.45k stars 2.02k forks source link

Changes in scale_color_manual ggplot 3.5.0 #5812

Closed samuel-marsh closed 5 months ago

samuel-marsh commented 5 months ago

Hi ggplot2 Team,

I'm probably just missing something because this feels very obvious but can't figure out the change that happened in 3.5.0 that caused this behavior.

In short I have function in my package (function link here) that returns plot with geom_point for the underlying data. The user can specify to create multiple plots splitting the data by given variable to show multiple conditions side-by-side. The plot is colored using scale_color_manual by another variable in the data. Sometimes as happens with data when splitting the data some of the levels that used for coloring the plots may not be present in all subsets.

Previously (ggplot 3.4.4) I provided a named vector of colors to values and added drop = FALSE and then used patchwork to collect the guides so that on legend is provided for all plots in the layout.

However, now with 3.5.0 it is returning multiple guides with the na level present but with no color next to it so that when guides are collected with patchwork it sees multiple guides and returns multiple with the plot.

Example here: image

I hope I explained that ok but please let me know if you have any questions. Maybe my mind just isn't working today and I'm missing something very basic that has changed (or maybe it is a bug with drop?) but any help you can provide would be appreciated!

Best, Sam

clauswilke commented 5 months ago

Could you please create a minimal reproducible example of the exact behavior that is causing you problems? Just draw up a simple plot with one or more legends that don't look the way you'd think they should.

teunbrand commented 5 months ago

This all sounds very similar to #5728. TL;DR: Dropping key glyphs is intended, to include a layer's key glyph in the legend without the data being present, use show.legend = TRUE.

samuel-marsh commented 5 months ago

@teunbrand @clauswilke Thanks both for quick replies!!

Indeed it does sound like that previous issue. However, in my case the base plot is being returned from another package as ggplot2 object. So can't pass the show.legend = TRUE in original geom_point call. What is the best (or least hacky maybe) was to pass this parameter when modifying that original plot?

Best, Sam

clauswilke commented 5 months ago

Again, please provide a minimal example that accurately represents your situation (maybe with a function that takes the plot object as input) so we can look into it and explore approaches.

samuel-marsh commented 5 months ago

Hi @clauswilke,

Working on re-creating essential aspects of functions without such heavy package dependencies. Working on recreating all the modifications that the original package does to the plot correctly and then how my function modifies that returned plot. Might be tomorrow before I can complete that but working on it.

I realize this isn't exactly minimal example because it's dependency heavy but in case it's help the full issue below does show the issue (requires Seurat; heavy dep).

Uses these two Seurat functions: DimPlot SingleDimPlot

Then modified by function in my package scCustomize: DimPlot_scCustom

This section in particular is modifying the color levels of the plot generated by Seurat: https://github.com/samuel-marsh/scCustomize/blob/fc7a282af3bef6e1cb816fac3b9229f536159238/R/Seurat_Plotting.R#L1892-L1935

# ggplot 3.5.0
install.packages("Seurat")
install.packages("scCustomize")

# plot with all levels present in both plots
DimPlot_scCustom(seurat_object = pbmc, split.by = "treatment")

# create subset without one level
data <- FetchData(object = pbmc, vars = c("treatment", "seurat_annotations"))

cells_remove <- data %>% 
  filter(treatment == "Treatment1" & seurat_annotations == "Naive CD4 T") %>% 
  rownames()

pbmc_subbed <- subset(pbmc, cells = cells_remove, invert = TRUE)

DimPlot_scCustom(seurat_object = pbmc_subbed, split.by = "treatment")

image

Running same thing with ggplot2 3.4.4 results in: image

Again I realize this is the heavy version for sure and working on more minimal example that maintains plots correctly.

Thanks so much again for the quick replies!! Best, Sam

clauswilke commented 5 months ago

You need to learn how to make minimal reproducible examples. Here, I'll give you a starting point, but you'll have to take it from here. I'm not going to read your package code or install your package or look at an example that is more than maybe 30-40 lines of code total.

library(tidyverse) 

data <- tibble(
  x = 1:3,
  y = 3:1,
  a = factor(c("A", "A", "B"), levels = c("A", "B", "C"))
)

# green dot is missing from legend
ggplot(data, aes(x, y, color = a)) +
  geom_point() +
  scale_color_manual(
    values = c(A = "red", B = "blue", C = "green")
  )


# green dot is present
ggplot(data, aes(x, y, color = a)) +
  geom_point(show.legend = TRUE) +
  scale_color_manual(
    values = c(A = "red", B = "blue", C = "green"),
    drop = FALSE
  )

Created on 2024-03-27 with reprex v2.0.2

teunbrand commented 5 months ago

I'd also like to note that using facetting instead of plot composition might circumvent the issue altoghether. From my reading of the problem, both plots share the same layers, coordinates and colour scales so that should lend itself to facetting.

Another solution might be to 'cut the middleman' and construct plots de novo instead of wrapping plotting functions that wrap ggplot2. I'd find it very strange if Seurat doesn't provide accessors to wrangle the relevant data out of their data structures.

samuel-marsh commented 5 months ago

Hi @clauswilke,

Yes, I know how to make reproducible example, but as I mentioned was trying to ensure that example mirrored what was happening in original function in terms of how plot is output and was providing the heavy version in case it was helpful while I was working on reprex. I will need to check this vs. original function, ut the simplest version would be this:

library(ggplot2)
library(patchwork)

# Dummy data
data_plot <- data.frame(
  UMAP1 = c(-4.232792, -4.892886, -5.508639, 11.332233, -7.450703),
  UMAP2 = c(-4.152139, 10.985685, -7.211088, 3.161727, 1.092022),
  group = factor(c("Group1", "Group2", "Group1", "Group1", "Group2")),
  treatment = factor(c("treat1", "treat1", "treat1", "treat1", "treat2"))
)

rownames(data_plot) <- paste0("A", seq_len(5))

# colors to use
colors <- c("blue", "red")
names(colors) <- unique(data_plot$group)

# plot without subsetting data by treatment
ggplot(data_plot, aes(x = UMAP1, y = UMAP2, color = group)) + 
  geom_point() +
  scale_color_manual(values = colors, drop = FALSE)


# get rownames of subsets
treat1_cells <- paste0("A", seq_len(4))
treat2_cells <- "A5"

# subset the data
data_plot_treat1 <- data_plot[treat1_cells, ]
data_plot_treat2 <- data_plot[treat2_cells, ]

# plot the subsets
p1 <- ggplot(data_plot_treat1, aes(x = UMAP1, y = UMAP2, color = group)) + 
  geom_point()

p2 <- ggplot(data_plot_treat2, aes(x = UMAP1, y = UMAP2, color = group)) + 
  geom_point() 

# HERE is where my package comes in
# p1 nd p2 are returned from Seurat
# Then modified and wrapped by my package

p1_mod <- p1 + scale_color_manual(values = colors, drop = FALSE)
p2_mod <- p2 + scale_color_manual(values = colors, drop = FALSE)

plots <- wrap_plots(p1_mod, p2_mod) + plot_layout(guides = "collect")
plots

Created on 2024-03-27 with reprex v2.1.0

@teunbrand Seurat does have accessors to pull the required data out and make plot de novo. However, as goal of this particular is to extend the functionality of a Seurat function it was easier to use the output of that function. That way if there are changes in Seurat that effect function (structure of Seurat object changes, new parameters, bug fixes, etc) they are already present without having to keep as close an eye on the codebase if I was plotting de novo. Using patchwork for assembly worked best at time but I can see if faceting them could work here.

Best, Sam

clauswilke commented 5 months ago

Would it be a problem to just set show.legend = TRUE in the functions that generate p1 and p2?

Alternatively, you could possibly code something like the following. You may have to iterate over all layers and make sure you're actually modifying geom_point() layers only.

library(tidyverse) 

data <- tibble(
  x = 1:3,
  y = 3:1,
  a = factor(c("A", "A", "B"), levels = c("A", "B", "C"))
)

# green dot is missing from legend
p <- ggplot(data, aes(x, y, color = a)) +
  geom_point() +
  scale_color_manual(
    values = c(A = "red", B = "blue", C = "green"),
    drop = FALSE
  )
p


# turn on green dot after the fact
p$layers[[1]]$show.legend <- TRUE
p

Created on 2024-03-27 with reprex v2.0.2

samuel-marsh commented 5 months ago

Hi @clauswilke,

Yes, I think I will put PR in to Seurat to add that to the base code for the function in question.

Thanks for the example with modifying the layers in output. I'll play around with that and faceting in the short-term until PR can be approved.

Thank you both @clauswilke and @teunbrand again for your help with this, I really appreciate it!!

Best, Sam