Open hughjonesd opened 1 year ago
Thanks for the report. geom_tile()
and geom_rect()
indeed aren't equivalent under scale transformations. The binned scale is equivalent to a scale transformation. I agree that the example is undesirable, and we've recently added this bit to the documentation to make the difference more clear:
Sure, but that doesn't quite cover it. The size of the tiles isn't being determined after transformation... it's being determined wrongly, and then the tiles aren't being displayed.
I think this is a real bug. Here's an example where the tiles are actually displayed in the wrong place:
ggp <- ggplot(data.frame(x = 2:4 + 0.5, y = 2:4), aes(x, y)) + geom_tile(width = .8, height = .25)
ggp # These should bin to 2, 3 and 4...
# but in fact...
ggp + scale_x_binned(breaks = 2:4)
I'm sorry I don't quite understand. How are they displayed wrongly? I've rendered an example below.
library(ggplot2)
tiled <- ggplot(data.frame(x = 2:4 + 0.5, y = 2:4), aes(x, y)) +
geom_tile(width = .8, height = .25)
tiled
tiled + scale_x_binned(breaks = 2:4)
To me, it seems that geom_rect()
is doing the wrong thing with equivalent parametrisation:
rects <- ggplot(data.frame(xmin = 2:4 + 0.1, xmax = 2:4 + 0.9,
ymin = 2:4 - 0.125, ymax = 2:4 + 0.125)) +
geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax))
rects
rects + scale_x_binned(breaks = 2:4)
Created on 2023-05-04 with reprex v2.0.2
So my thought was: "2.5 - 0.4 = 2.1, should bin to 2; 2.5 + 0.4 = 2.9, should bin to 3". Actually I think the bins ought to be 2.5, 3.5 etc. i.e. midpoints of the breaks. But neither of those things are happening. Indeed, x
and y
are scaled and then width is not.
Here's a more extreme example:
ggp <- ggplot(data = NULL, aes(x, y)) +
geom_tile(data = data.frame(x = c(0, 5, 10), y = 1:3), width = 2, height = .25) +
geom_point(data = data.frame(x = c(-1, 1, 4, 6, 9, 11), y = rep(1:3, each = 2)), color = "red")
ggp
ggp + scale_x_binned(breaks = c(0, 5, 10))
My expectation would be that the rectangle limits would be (-1, +1); (4,6) and (9, 11). The first and last ones have edges which are out of the limits of the binned scale, so maybe they are dropped, or maybe like the points they are just left alone. The second one would bin to (2.5, 7.5).
In fact: the first rectangle disappears. The second one goes to (-0.5, 7.5). The third one goes to (2.5, 10.5).
I don't think anyone would expect that - why would a rectangle (4,6) be mapped to (-0.5, 7.5) by binning to two bins from 0 to 5 and 5 to 10?
The real reason is that the first call to map_position
has mapped x
to c(1,2,3)
, representing the levels. Then the width gets calculated from this, creating xmin
of c(0,1,2)
and xmax
of c(2,3,4)
. The second call to map_position
then translates these back to their corresponding bin centres, creating
xmin = c(NA, -0.5, 2.5)
and xmax = c(2.5, 7.5, 10.5)
.
I don't think anyone who hasn't read the source code will understand this, or be able to use it for any practical purpose.
So yeah, the disclaimer in the documentation is better than nothing, but I think it would be simpler to just put "geom_tile doesn't work with binned scales".
Similar concerns apply with a logged scale:
ggp <- ggplot(data = NULL, aes(x, y)) + ylim(0,2)+
geom_tile(data = data.frame(x = 10, y = 1), width = 2, height = .25) +
geom_point(data = data.frame(x = c(9, 11), y = 1), color = "green")
ggp
ggp + scale_x_log10()
This makes it look as if 9 is 1 and 11 is 100. Again, you can say that it is working according to the documentation, but the point is, how is it meant to represent data?
My expectation as a user would be that I can use geom_tile to represent some data. Then if I choose to put that data on a log scale, or bin it or whatever, geom_tile
keeps displaying the same answers using the new scale.
Does this also mean that geom_tile does not work with discrete scales (scale_x_discrete)? I'm struggling to get my plotted data into the correct categories on the X axis.
So my thought was: "2.5 - 0.4 = 2.1, should bin to 2; 2.5 + 0.4 = 2.9, should bin to 3". Actually I think the bins ought to be 2.5, 3.5 etc. i.e. midpoints of the breaks.
I think that binning works slightly different than you're expecting here. It is more of a findInterval()
situation than 'snap to nearest break'.
The underlying reason that geom_tile()
doesn't behave like geom_rect()
, is that the width
and height
are not position aesthetics, and thus aren't transformed by scales. So a width = 2
on a log10 scale spans 2 orders of magnitude. While admittedly not great for scale transforms, this parametrisation does allow it to work with many stats seamlessly.
@jfmusso It works for discrete scales because you can combine continuous values on a discrete scale (but not the other way around). Discrete position scales are esstentially seq_along(limits)
, so there is 1 axis unit between each level and a width = 2
spans 2 level's worth of axis.
Perhaps is one issue that there are different potential users for GeomTile
? I get that it might be useful for developers who want to e.g. place something at x,y
with a "real" onscreen width. But this makes it hard to understand for end users, who have to think in terms of two different sets of coordinates.
Perhaps it might be helpful to separate the two functionalities, and provide a public-facing version of geom_tile that indeed works in data coordinates.
The following code works, producing output with
xmin
andxmax
aligned to bins:But this code, which ought to do the same thing, produces an empty plot:
The underlying reason is in the second call to
layout$map_position()
inggplot_build()
.x
variables back from a factor to (the binned version of) their original values. ForGeomRect
which hasxmax
andxmin
from the start, this works.GeomTile
calculatesxmin
andxmax
fromx
andwidth
. By the time it gets tolayer$compute_geom_1
,x
has been transformed to a "factor"-style numeric of bins. The geom doesn't realise this and happily adds the original width to the bin.layout$map_position()
takes this wonky data and turns it back, typically toNA
.xmin
andxmax
are removed.In other words,
GeomTile$setup_data()
is being called after the firstmap_position()
, but in this case at least, it needs to be called before it.This bug exists in ggplot2 3.4.2, and also on github main as of today.