tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.55k stars 2.03k forks source link

Grouped boxplots that don't drop levels #3345

Open ldecicco-USGS opened 5 years ago

ldecicco-USGS commented 5 years ago

How do I (or...can I?) create a grouped boxplot in ggplot2 that does not drop the levels?

Here's an example:

df <- data.frame(x = rep(letters[1:2],6),
                 y = sample(100, size = 12),
                 type = rep(c("x","y","z"),4))

df2 <- df[!(df$x == "b" & df$type == "z"),]
df2$type <- factor(df2$type, levels = c("x","y","z"))

ggplot(data = df2) +
  geom_boxplot(aes(x=x,y=y, fill=type)) +
  scale_fill_discrete(drop=FALSE)

And what I get is: image

What I would like is for "b" to have a 3rd empty grouping for type "z". I have tried adding an empty row like this:

df3 <- rbind(df2, 
             data.frame(x = "b",
                        y = NA,
                        type = "z"))

df3$type <- factor(df3$type, levels = c("x","y","z"))

ggplot(data = df3) +
  geom_boxplot(aes(x=x,y=y, fill=type)) +
  scale_fill_discrete(drop=FALSE)

But I get the same result.

There's a solution from 2013 involving fake data and changing limits...but I was hoping to see if anything had improved since then that I've missed:

df4 <- rbind(df2, 
             data.frame(x = "b",
                        y = 1000,
                        type = "z"))

df4$type <- factor(df4$type, levels = c("x","y","z"))

ggplot(data = df4) +
  geom_boxplot(aes(x=x,y=y, fill=type)) +
  scale_fill_discrete(drop=FALSE) +
  coord_cartesian(ylim = range(df2$y))

This is good, but seems hacky: image

paleolimbot commented 5 years ago

I think you're looking for position_dodge2(preserve = "single"):

library(ggplot2)
df <- data.frame(x = rep(letters[1:2],6),
                 y = sample(100, size = 12),
                 type = rep(c("x","y","z"),4))

df2 <- df[!(df$x == "b" & df$type == "z"),]
df2$type <- factor(df2$type, levels = c("x","y","z"))

ggplot(df2) +
  geom_boxplot(
    aes(x = x, y = y, fill = type), 
    position = position_dodge2(preserve = "single")
  )

Created on 2019-05-31 by the reprex package (v0.2.1)

ldecicco-USGS commented 5 years ago

It's not perfect because you'd want the green boxplot to be lined up right at "b"...but does at least get the widths consistent. Thanks!

paleolimbot commented 5 years ago

I see what you mean. I don't think it's possible to do that with positions at the moment, but you could use facets to get a similar result:

library(ggplot2)
df <- data.frame(x = rep(letters[1:2],6),
                 y = sample(100, size = 12),
                 type = rep(c("x","y","z"),4))

df2 <- df[!(df$x == "b" & df$type == "z"),]
df2$type <- factor(df2$type, levels = c("x","y","z"))

ggplot(df2) +
  geom_boxplot(aes(x = type, y = y, col = type)) +
  facet_wrap(vars(x))

Created on 2019-05-31 by the reprex package (v0.2.1)

ldecicco-USGS commented 5 years ago

Yeah....I thought of that initially, but the actual plot I'm trying to make already uses facets. If I decide the position_dodge2 isn't good enough (...I think it probably is for my case), I might consider using patchwork or something like that to bring it all together.

hadley commented 5 years ago

I wonder if we should have drop = FALSE option to position_dodge2() to tell it to use the factor levels, rather than the actual positions? (Or maybe it's too late at that point?)

MarkErik commented 5 years ago

It's not perfect because you'd want the green boxplot to be lined up right at "b"...

+1 as I've encountered this issue many times (not for Boxplots, but other geoms), and as I also use facets, having a solution to keep width and position (so that if there is missing data, it won't centre the remaining items) would be excellent.

hadley commented 5 years ago

If you wonder why something so seemingly simple is so hard to fix, I'd suggest watching @karawoo's excellent rstudio::conf() talk: https://resources.rstudio.com/rstudio-conf-2019/box-plots-a-case-study-in-debugging-and-perseverance

ldecicco-USGS commented 5 years ago

I hope creating an Issue to report non-ideal behavior like this doesn't imply I think it's a "seemingly simple fix"...because I don't think that at all!!! I've watched that talk already...it's fantastic...and my goal in creating an issue is to NOT have to do it myself!

paleolimbot commented 5 years ago

It is a continuing problem that comes up a lot (at least one other expert ggploter has asked me about this exact issue), and we're glad to have it as an issue! Dropping levels works different in scales, facets, and positions, and we don't have workaround (of which I am aware) for this.

ivan-paleo commented 2 years ago

I'm having exactly the same issue: dropped levels within one facet and I don't know how to adjust the boxes' width and positions for that facet. Is there anything new on this issue?

teunbrand commented 2 months ago

Since this came up recently in https://github.com/tidyverse/ggplot2/pull/6100#issuecomment-2346807193, I'll put forth my two cents.

The root cause of the issue is that the group aesthetic carries no information of the original aesthetics that contributed to the grouping structure.

If we take the following plot from the top of this issue;

The internals can only know 'I have two positions, 3 groups at position 'a' and 2 groups at position 'b'. It does not know that groups 1 and 4 share "type = 'x'", or that groups 2 and 5 share "type = 'y'". What has contributed to these groups cannot be reconstructed from the information available to the position adjustment.

I have discussed this with Thomas a few times and we both haven't been able to come up with great solutions. Until one emerges, I don't think this will be fixed.