tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.5k stars 2.02k forks source link

Geom_dotplot dots overlap when 'fill' in 'aes' #3620

Closed James-G-Hill closed 3 years ago

James-G-Hill commented 4 years ago

When using 'geom_dotplot' whether 'fill' is used in the 'aes' function or not also causes the dots to overlap or not. This previously appeared in a stackexchange question a few years back but I didn't see that anyone mentioned it was a bug or not. The behaviour doesn't seem correct to me as I wouldn't expect placing 'fill' in the 'aes' function should affect the layout of the dots.

https://stackoverflow.com/questions/38477616/overlapping-points-when-using-fill-aesthetic-in-ggplot2-geom-dotplot-in-r/38479399

I've copied the code from the link above. I noticed this because I moved 'fill' from outside of the 'aes' into the 'aes' when I decided I wanted the colour to change depending on another attribute.

library("ggplot2")

n <- 200
x <- data.frame(x = sample(x = letters[1:3], size = n, replace = TRUE),
                y = rnorm(n = n, mean = 0, sd = 1),
                a = sample(x = letters[4:5], size = n, replace = TRUE))

p1 <- ggplot(x, aes(x = x, y = y))
p1 <- p1 + geom_dotplot(binaxis = "y", stackdir = "center")
p2 <- ggplot(x, aes(x = x, y = y, fill = a))
p2 <- p2 + geom_dotplot(binaxis = "y", stackdir = "center")
yutannihilation commented 4 years ago

Curious. This is not a bug and probably a known limitation similarly to https://github.com/tidyverse/ggplot2/issues/3612, which we don't know how to fix (yet).

I wouldn't expect placing 'fill' in the 'aes' function should affect the layout of the dots.

fill does affect the layout. When you specify fill, it's used for calculating groups, and the layout depends on the grouping.

library("ggplot2")

set.seed(1)
n <- 200
x <- data.frame(x = sample(x = letters[1:3], size = n, replace = TRUE),
                y = rnorm(n = n, mean = 0, sd = 1),
                a = sample(x = letters[4:5], size = n, replace = TRUE))

p1 <- ggplot(x, aes(x = x, y = y))
p1 <- p1 + geom_dotplot(binaxis = "y", stackdir = "center")
layer_data(p1)$group
#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
#>   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2
#>  [71] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [106] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3
#> [141] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [176] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

p2 <- ggplot(x, aes(x = x, y = y, fill = a))
p2 <- p2 + geom_dotplot(binaxis = "y", stackdir = "center")
layer_data(p2)$group
#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
#>   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
#>  [36] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3
#>  [71] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4
#> [106] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5
#> [141] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6
#> [176] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

Created on 2019-11-14 by the reprex package (v0.3.0)

James-G-Hill commented 4 years ago

Okay, I'm only a confused user, but my expectation when adding the fill aesthetic to chart p1 in your reprex is that every dot in chart p2 would be in exactly the same location as in chart p1 with the only difference being that the internal fill colour of the dots would now change.

That is the sense I meant I wouldn't expect 'fill' to affect the layout. What I would expect to alter the positions of the dots are the x and y aesthetics, and some of the stacking options in geom_point.

yutannihilation commented 4 years ago

I agree this is confusing. I'm just saying we don't have a solution for this at the moment. Feel free to contribute if you find some nice implementation :)

James-G-Hill commented 4 years ago

I had a look at the geom_point file but I'm guessing the implementation of this goes a lot deeper into the code? Unfortunately I've not looked at the internals of ggplot2 before. Just out of curiosity, what is defined as a bug and why would this not be considered one?

clauswilke commented 4 years ago

Actually, in this case, the issue seems to be that by default geom_dotplot() doesn't place data points belonging to different groups into the same bin. This could be fixed, and from looking at the SO answer there are already some arguments that modify how data points in different groups are handled, so a solution could work off of that.

yutannihilation commented 4 years ago

Oh, can it be fixed? Sorry for jumping to conclusions...

clauswilke commented 4 years ago

I suspect it can be fixed. No idea how much work it would be, though.

James-G-Hill commented 4 years ago

I think the option 'stackgroups' set to TRUE should give the desired effect but the stack carries across from one x axis value to another, as you can see with the code below. You should see that the offset of the dots is added across the x values: compare x = a, y = 1 where the 4 dots start on the 'a' line with x = b, y = 1 where the dots start at the 5th dot position from the 'b' line.

When stackdir is set to 'center' there is a similar effect but now the direction of the offset between 'x' values is left or right depending on whether the 'x' value is left or right of the central 'x' value.

library("ggplot2")

n <- 200
x <- data.frame(x = sample(x = letters[1:3], size = n, replace = TRUE),
                y = rnorm(n = n, mean = 0, sd = 1),
                a = sample(x = letters[4:5], size = n, replace = TRUE))

ggplot(x, aes(x = x, y = y, fill = a)) +
  geom_dotplot(binaxis = "y", method = "histodot", stackgroups = TRUE)
#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

library("ggplot2")

n <- 200
x <- data.frame(x = sample(x = letters[1:3], size = n, replace = TRUE),
                y = rnorm(n = n, mean = 0, sd = 1),
                a = sample(x = letters[4:5], size = n, replace = TRUE))

ggplot(x, aes(x = x, y = y, fill = a)) +
  geom_dotplot(binaxis = "y", stackdir = "center", method = "histodot", stackgroups = TRUE)
#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

clauswilke commented 4 years ago

@James-G-Hill Could you run your various examples through the reprex package so we can see the output as well?

James-G-Hill commented 4 years ago

@James-G-Hill Could you run your various examples through the reprex package so we can see the output as well?

Done! Sorry, this is my first time to use reprex. In the charts you can see how the positional offset of the dots from the center of their 'x' category are carried across to the other categories.

yutannihilation commented 4 years ago

(Just for note)

To be honest, I feel the code below should simply work, but it actually doesn't; because uniquecols() requires all fill values are the same within a group, the fill is dropped.

ggplot(x, aes(x = x, y = y, fill = a, group = x)) +
  geom_dotplot(binaxis = "y", stackdir = "center")

I mean, I'm not sure if it's really a good idea to extend the GeomDotplot's functionality to handle different groups. To me, it seems it should be resolved via proper grouping. But, I might be wrong and don't have strong opinion at the moment.

clauswilke commented 4 years ago

Is #1745 the same problem?

clauswilke commented 4 years ago

Based on https://github.com/tidyverse/ggplot2/issues/1745#issuecomment-341180271, I assume geom_dotplot() needs to implement its own draw_panel() function so it can perform calculations that span groups.

thomasp85 commented 3 years ago

Fixed by #4417