tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.5k stars 2.02k forks source link

Aligning plots: Implementation of drop = FALSE for position_dodge #3988

Closed char4816 closed 2 months ago

char4816 commented 4 years ago

As others have pointed out (https://github.com/tidyverse/ggplot2/issues/688), having a position_dodge option such as drop = FALSE would be really helpful for plots like this:

image

I know Hadley has given some helpful workaround suggestions (e.g. faceting) and has mentioned that this isn't within the scope of position_dodge but I am hoping that one more person raising the issue might help to make the change.

What's happening for some of my plots' x-categories is that there are only 2 data points (e.g. cases in Homozygous Alt in my plot), which is enough to generate a boxplot, but not a violin plot

geom_violin(alpha=0.4, position = position_dodge(width=0.9))

Because the 3rd green violin plot doesn't have a counterpart to "dodge", the violin plot is centered and therefore wider and not aligned with its corresponding boxplot which is quite sad to look at.

While faceting works to an extent: image

It would look much cleaner for publication if I could do position = position_dodge(width=0.9), drop = FALSE)

I hope this suggestion is taken seriously and I really appreciate your help & time.

Chris

thomasp85 commented 4 years ago

@karawoo can you comment on the amount of work this would require, seeing that you were the last to dive head-first into dodging 🙂

karawoo commented 4 years ago

In the dev version of ggplot2 I can't reproduce this specific issue; having only two observations still generates a violin:

library("ggplot2")

set.seed(5)
dat <- data.frame(x = rep(1, times = 5), y = rnorm(5, 1), z = c("A", "A", "A", "B", "B"))

ggplot(dat, aes(x = x, y = y, fill = z)) +
  geom_boxplot(position = position_dodge(width = 0.9)) +
  geom_violin(alpha = 0.5)

Created on 2020-08-31 by the reprex package (v0.3.0.9001)

That said, this is a similar question to others that have come up before (https://github.com/tidyverse/ggplot2/issues/3022#issuecomment-444295026, https://github.com/tidyverse/ggplot2/issues/2076, #2813). The short answer is that having more control over placement of a geom within a group is not very easy. It may be possible, and it would be nice if we could improve the situation, but right now it's not straightforward.

char4816 commented 4 years ago

Thank you all for looking into this.

I was doing this on R version 3.6.2 using ggplot2 3.3.2. On these versions, your example code produces the following plot:

image

clauswilke commented 4 years ago

I still think the fix in PR #2813 should be applied, by the way. The request here for a drop = FALSE is the same as the current option preserve = "single", I believe. And preserve = "single" can be made to work (somewhat) with violins using the code in #2813.

karawoo commented 4 years ago

@char4816 can you see if you can reproduce my results with the development version of ggplot2? You can install it with remotes::install_github("tidyverse/ggplot2").

@clauswilke I'm not against merging #2813 since it would make the preserve = "single" behavior for violins consistent with other geoms. That still leaves the problem of everything getting moved to the left though, and I expect we'll keep getting variations on this issue until we can do something about that.

clauswilke commented 4 years ago

@karawoo Agreed. I suspect a proper fix for position_dodge() would require some sort of aesthetic mapping and a scale, and I haven't been able to come up with a reasonable way to implement this.

char4816 commented 4 years ago

Using the development version of ggplot2, that example works great now. However, the overarching issue is bigger than just violin plots (and whether or not two points is enough to create the violin plot). For example, if we wanted to plot the following dataframe with boxplots,

  cat ind values
1   A   x      4
2   A   x      2
3   B   x     NA
4   B   x     NA
5   A   y      4
6   A   y      5
7   B   y      6
8   B   y      9

the two missing values will make the dodging look bad. Here is some reproducible code:

data <- data.frame(
  cat=c('A','A','B','B','A','A','B','B'), 
  ind=c('x','x','x','x','y','y','y','y'),
  values=c(4,2,NA,NA,4,5,6,9)
)

p  <- ggplot() +
  scale_colour_hue(guide='none') +
  geom_boxplot(aes(x=as.factor(cat), y=values, fill=ind),
               position=position_dodge(width=.90), 
               data=data,
               outlier.size = 1.2,
               na.rm=T)

print(p) #produces undesired result where the second blue boxplot spans the full width

image

The best solution I have seen so far for this problem is to make one of the NAs equal to zero so a box plot is generated, and then to cover up that boxplot... (not a very pretty solution).

data <- data.frame(
  cat=c('A','A','B','B','A','A','B','B'), 
  ind=c('x','x','x','x','y','y','y','y'),
  values=c(4,2,NA,0,4,5,6,9) #notice I changed the second NA to now be zero
  )

p  <- ggplot() +
  scale_colour_hue(guide='none') +
  geom_boxplot(aes(x=as.factor(cat), y=values, fill=ind),
               position=position_dodge(width=.90), 
               data=data,
               outlier.size = 1.2,
               na.rm=T) +
  #Someone suggested this workaround where you use geom_line to cover up the unwanted boxplot with a thick white line
  geom_line(aes(x=x, y=y), 
            data=data.frame(x=c(0,3),y=c(0,0)), 
            size = 1.5, 
            col='white')
print(p) #this looks more like what we wanted

image

teunbrand commented 2 months ago

I'm closing this issue for the following reasons. The example in https://github.com/tidyverse/ggplot2/issues/3988#issuecomment-684061867 can be fixed by setting preserve = 'single'. Granted, the box is not on the right hand side of the B-gridline, but that is unrelated to the issue of the box's width.

library(ggplot2)

data <- data.frame(
  cat=c('A','A','B','B','A','A','B','B'), 
  ind=c('x','x','x','x','y','y','y','y'),
  values=c(4,2,NA,NA,4,5,6,9)
)

ggplot(data) +
  geom_boxplot(
    aes(x=as.factor(cat), y=values, fill=ind),
    position=position_dodge(width=.90, preserve = 'single'),
    na.rm=T
  )

The original issue of misalignment between boxplots and violin plots with <2 observations can be mitigateed using drop = FALSE.

set.seed(5)
dat <- data.frame(x = rep(1, times = 4), y = rnorm(4, 1), z = c("A", "A", "A", "B"))

ggplot(dat, aes(x = x, y = y, fill = z)) +
  geom_boxplot(position = position_dodge(width = 0.9)) +
  geom_violin(alpha = 0.5, drop = FALSE)
#> Warning: Cannot compute density for groups with fewer than two datapoints.

Created on 2024-07-24 with reprex v2.1.1