Closed char4816 closed 2 months ago
@karawoo can you comment on the amount of work this would require, seeing that you were the last to dive head-first into dodging 🙂
In the dev version of ggplot2 I can't reproduce this specific issue; having only two observations still generates a violin:
library("ggplot2")
set.seed(5)
dat <- data.frame(x = rep(1, times = 5), y = rnorm(5, 1), z = c("A", "A", "A", "B", "B"))
ggplot(dat, aes(x = x, y = y, fill = z)) +
geom_boxplot(position = position_dodge(width = 0.9)) +
geom_violin(alpha = 0.5)
Created on 2020-08-31 by the reprex package (v0.3.0.9001)
That said, this is a similar question to others that have come up before (https://github.com/tidyverse/ggplot2/issues/3022#issuecomment-444295026, https://github.com/tidyverse/ggplot2/issues/2076, #2813). The short answer is that having more control over placement of a geom within a group is not very easy. It may be possible, and it would be nice if we could improve the situation, but right now it's not straightforward.
Thank you all for looking into this.
I was doing this on R version 3.6.2 using ggplot2 3.3.2. On these versions, your example code produces the following plot:
I still think the fix in PR #2813 should be applied, by the way. The request here for a drop = FALSE
is the same as the current option preserve = "single"
, I believe. And preserve = "single"
can be made to work (somewhat) with violins using the code in #2813.
@char4816 can you see if you can reproduce my results with the development version of ggplot2? You can install it with remotes::install_github("tidyverse/ggplot2")
.
@clauswilke I'm not against merging #2813 since it would make the preserve = "single"
behavior for violins consistent with other geoms. That still leaves the problem of everything getting moved to the left though, and I expect we'll keep getting variations on this issue until we can do something about that.
@karawoo Agreed. I suspect a proper fix for position_dodge()
would require some sort of aesthetic mapping and a scale, and I haven't been able to come up with a reasonable way to implement this.
Using the development version of ggplot2, that example works great now. However, the overarching issue is bigger than just violin plots (and whether or not two points is enough to create the violin plot). For example, if we wanted to plot the following dataframe with boxplots,
cat ind values
1 A x 4
2 A x 2
3 B x NA
4 B x NA
5 A y 4
6 A y 5
7 B y 6
8 B y 9
the two missing values will make the dodging look bad. Here is some reproducible code:
data <- data.frame(
cat=c('A','A','B','B','A','A','B','B'),
ind=c('x','x','x','x','y','y','y','y'),
values=c(4,2,NA,NA,4,5,6,9)
)
p <- ggplot() +
scale_colour_hue(guide='none') +
geom_boxplot(aes(x=as.factor(cat), y=values, fill=ind),
position=position_dodge(width=.90),
data=data,
outlier.size = 1.2,
na.rm=T)
print(p) #produces undesired result where the second blue boxplot spans the full width
The best solution I have seen so far for this problem is to make one of the NAs equal to zero so a box plot is generated, and then to cover up that boxplot... (not a very pretty solution).
data <- data.frame(
cat=c('A','A','B','B','A','A','B','B'),
ind=c('x','x','x','x','y','y','y','y'),
values=c(4,2,NA,0,4,5,6,9) #notice I changed the second NA to now be zero
)
p <- ggplot() +
scale_colour_hue(guide='none') +
geom_boxplot(aes(x=as.factor(cat), y=values, fill=ind),
position=position_dodge(width=.90),
data=data,
outlier.size = 1.2,
na.rm=T) +
#Someone suggested this workaround where you use geom_line to cover up the unwanted boxplot with a thick white line
geom_line(aes(x=x, y=y),
data=data.frame(x=c(0,3),y=c(0,0)),
size = 1.5,
col='white')
print(p) #this looks more like what we wanted
I'm closing this issue for the following reasons.
The example in https://github.com/tidyverse/ggplot2/issues/3988#issuecomment-684061867 can be fixed by setting preserve = 'single'
. Granted, the box is not on the right hand side of the B-gridline, but that is unrelated to the issue of the box's width.
library(ggplot2)
data <- data.frame(
cat=c('A','A','B','B','A','A','B','B'),
ind=c('x','x','x','x','y','y','y','y'),
values=c(4,2,NA,NA,4,5,6,9)
)
ggplot(data) +
geom_boxplot(
aes(x=as.factor(cat), y=values, fill=ind),
position=position_dodge(width=.90, preserve = 'single'),
na.rm=T
)
The original issue of misalignment between boxplots and violin plots with <2 observations can be mitigateed using drop = FALSE
.
set.seed(5)
dat <- data.frame(x = rep(1, times = 4), y = rnorm(4, 1), z = c("A", "A", "A", "B"))
ggplot(dat, aes(x = x, y = y, fill = z)) +
geom_boxplot(position = position_dodge(width = 0.9)) +
geom_violin(alpha = 0.5, drop = FALSE)
#> Warning: Cannot compute density for groups with fewer than two datapoints.
Created on 2024-07-24 with reprex v2.1.1
As others have pointed out (https://github.com/tidyverse/ggplot2/issues/688), having a position_dodge option such as
drop = FALSE
would be really helpful for plots like this:I know Hadley has given some helpful workaround suggestions (e.g. faceting) and has mentioned that this isn't within the scope of position_dodge but I am hoping that one more person raising the issue might help to make the change.
What's happening for some of my plots' x-categories is that there are only 2 data points (e.g. cases in Homozygous Alt in my plot), which is enough to generate a boxplot, but not a violin plot
Because the 3rd green violin plot doesn't have a counterpart to "dodge", the violin plot is centered and therefore wider and not aligned with its corresponding boxplot which is quite sad to look at.
While faceting works to an extent:
It would look much cleaner for publication if I could do
position = position_dodge(width=0.9), drop = FALSE)
I hope this suggestion is taken seriously and I really appreciate your help & time.
Chris