Closed wurli closed 2 years ago
Thanks for the suggestion, but this doesn't sound convincing to me. The problem of this example looks the width of each bar, rather than the alignment? (not sure if the intention of your example is to show daily values or monthly values)
library(ggplot2)
library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
df <- tibble(
month = as_date(c("2020-01-01", "2020-02-01", "2020-03-01")),
value = 1:3
)
ggplot(df, aes(month, value)) +
geom_col(width = 1) +
scale_x_date(date_labels = "%b %d")
Created on 2022-07-23 by the reprex package (v2.0.1)
Apologies, my initial example was a bit rushed and possibly didn't show my issue clearly enough. Perhaps this edit will help clarify. Here, columns would indicate monthly totals of value
, but points show the more granular figures. In this example, each bar should overlap with three points, but clearly this isn't what happens by default - although it would be made really easy but an align
argument.
library(ggplot2)
library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
df <- tibble(
month = seq.Date(as_date("2020-01-01"), as_date("2020-03-31"), length.out = 9),
value = 1:9
)
ggplot(df, aes(month, value)) +
# Colums show totals for each month. Here the values of `month` are always
# the first day of the month. The obvious solution is 'don't do this', but I'd
# argue that it's such common practice that ggplot2 should facilitate this
# sort of approach
geom_col(
data = ~ .x |>
group_by(month = floor_date(month, "month")) |>
summarise(value = sum(value))
) +
# Points show the more granular values
geom_point()
Created on 2022-07-25 by the reprex package (v2.0.1)
I guess the broader point is that the current behaviour is fine if using a discrete axis, which is probably the case for 90% of bar charts. For the remaining 10% which use a continuous axis it's not (as) obvious how the bar should be aligned, so I'd argue a bit more control is warranted. For this sort of thing, the current nudge
argument doesn't quite hit the spot in my opinion.
Ah, sorry, I didn't get your point. So, is this the plot you want to draw?
library(ggplot2)
library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
df <- tibble(
month = seq.Date(as_date("2020-01-01"), as_date("2020-03-31"), length.out = 9),
value = 1:9
)
width <- 0.9 * 30
ggplot(df, aes(month, value)) +
geom_col(
data = ~ .x |>
group_by(month = floor_date(month, "month")) |>
summarise(value = sum(value)),
position = position_nudge(x = width / 2),
width = width
) +
geom_point()
Created on 2022-07-25 by the reprex package (v2.0.1)
I think it's very close. Correct me if I'm wrong, but I think that usually the width of the columns would be 0.9 * 29
, not 0.9 * 30
. I only know this from looking at the geom_col()
source code - I think it'd be calculated roughly as follows:
res <- df$month |>
floor_date("month") |>
unique() |>
as.numeric() |>
resolution(zero = FALSE)
res
#> [1] 29
res * 0.9
#> [1] 25.2
I'm also not sure the left border of the column should exactly line up with the first of each month. With the default behaviour, some padding is added to the left and right of the column. It feels like this should possibly be the case with align = "left"
too. Meaning you'd have width = 0.9 * 29
and position = position_nudge(x = (29 * 0.5) + (29 * 0.05))
. Possibly having it 'flush' makes more sense though.
Anyway, I think this somewhat demonstrates what I'm trying to say - to achieve this a user has to know some fairly obscure details:
month
is treated as numeric, with individual days as consecutive integers (hence why width
is naïvely 30
- roughly the number of days in a month)resolution()
with zero = FALSE
, giving 29
(incidentally, I think this should also probably be made clearer in the docs)Seems to me much simpler to just add an align
argument. Any thoughts? Thanks for bearing with.
Thanks, I think your calculation is correct. I agree it might make sense.
Would you be happy to review a PR if I submitted one? To be honest I think it'd be quite simple to implement.
Yes, I'm happy to review. I too feel the implementation won't be very complicated.
One thing I'd like to discuss here is the interface. In my opinion, hjust
is better than align
. hjust
is more general and the horizontal positions are not necessarily limited to only the 3 values (center, right, left). You can just put hjust
into xmin = x - width * (1 - hjust), xmax = x + width * hjust
. But I agree align
might be more intuitive to users.
Great, I'll get working on something.
Good point about the interface. I think, for the sake of consistency, you're right that hjust
is better, not least because vjust
would also be the obvious counterpart when using horizontal bars.
This may be out of scope for this discussion, but another gripe I occasionally have with bar geoms is that it's only possible to 'base' the bars at x = 0
or y = 0
. As an analogy, geom_area()
has a more fine-grained counterpart geom_ribbon()
which allows you to adjust the position of the base, but geom_col()
has no such counterpart. I'd tentatively suggest adding arguments xmin
/ymin
to geom_col()
to give some control here. One possible use-case would be in the creation of plots like the following:
library(ggplot2)
library(dplyr, warn.conflicts = FALSE)
df1 <- tibble(
x = c(1, 1, 2, 2),
y = c(-2, 1, -1, 2),
fill = c("a", "b", "c", "d")
)
df2 <- tibble(
xmin = c(1, 2) - 0.45,
xmax = c(1, 2) + 0.45,
ymin = c(-2, -1),
ymax = c(1, 2)
)
ggplot(df1) +
geom_col(aes(x, y, fill = fill)) +
# The only way to achieve a border around the columns is to simulate a column
# geom using `geom_rect()`, which requires a lot of knowledge about how
# width/resolution are calculated.
geom_rect(
aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
colour = "black", fill = "transparent",
data = df2
)
Created on 2022-07-26 by the reprex package (v2.0.1)
This is again a fairly obscure use, but my opinion is that exposing xmin
/ymin
arguments (possibly only to geom_col()
) would simply offer a bit of additional flexibility in a rather intuitive way. Another (more obvious) use-case is for when bars should simply begin at a different y-value because that's what the data dictates. Currently the way to approach such problems would probably be to adjust the y-scale (e.g. using scale_y_continuous(labels = ...)
). If I'm putting in a PR this might be a good place to sneakily add such a feature 😄
Sorry to reopen this in the eleventh hour of release. While writing the blog post I realise that I feel the meaning of 0 and 1 is backwards. In my head the just
argument defines the justification of the bar relative to the axis break (so just = 0
would place the left side of the bar at the axis break), but in actuality it is the reverse.
Any objections to me switching it around before release?
Any objections to me switching it around before release?
Personally I'm a bit torn. My intuition is 0 = further left, 1 = further right, which is currently the case if your point of reference is the x-axis, but not if your point of reference is the bar itself. I find the former more intuitive, but happy to go with what you think as you'll be more familiar with the conventions in ggplot2.
Throughout ggplot, we're using justification values in two different contexts. Let's explain them for the case of hjust
. (The same applies to vjust
just vertically.) The first is how an object is placed relative to a reference point. In this case, hjust = 0
means that the object is placed such that the reference point is at the left-most location of the object, its own internal x=0
so to speak. And similarly, hjust = 1
means that the reference point is at the right-most location. Visually, this looks like hjust = 0
moves the object to the right, and hjust = 1
moves the object to the left.
The second is how an object is placed relative to a reference range. This is the case for example in the placement of the axis title relative to the horizontal extent of the plot. In this case, hjust = 0
means move the object all the way to the left so its left side is aligned with the left end of the reference range, and hjust = 1
does the opposite. Thus, hjust = 0
moves the object to the left, and hjust = 1
moves the object to the right.
To be consistent with the rest of ggplot, here, I think we need to figure out whether we're operating in the first or the second context, and apply the justification accordingly. I haven't looked into this too closely, but it sounds to me like we're operating under context 1, and therefore just = 0
should mean the bar sits to the right of the axis break, such that its left side is aligned with the break.
I was curious so took a look at other geoms - to me, behaviour doesn't seem to be that consistent, but maybe there's a rule I haven't spotted.
element_text()
fits context 2 and hjust = 0
moves text to the left of the plotgeom_raster()
fits context 1 and hjust = 0
moves rasters to the left of axis breaksgeom_label()
fits context 1 (?) and hjust = 0
moves labels to the right of axis breaksgeom_col()
fits context 1 and just = 0
currently moves columns to the left of axis breaksIt's possible geom_raster()
was implemented thinking about it the opposite way, using the extent of the raster as the reference range and the point on the plot as the thing that is positioned relative to the reference range.
Context 1 is used all over grid, in the way I've described. Also, legend justification follows context 1, if I remember correctly.
Thanks. When I reviewed the pull request, I didn't consider the semantics of *just
carefully.
I'm not sure I'm for or against the suggestion at the moment, but, at least, the behavior of geom_col()
is consistent with geom_raster()
.
library(ggplot2)
d <- expand.grid(x = 1, y = 1:2)
ggplot(d, aes(x, y)) +
geom_raster(hjust = 1, fill = "red", alpha = 0.5) +
geom_raster(hjust = 0, fill = "blue", alpha = 0.5) +
coord_equal()
ggplot(d[1,], aes(x, y)) +
geom_text(size = 20, hjust = 1, label = "hjust = 1", colour = "red", alpha = 0.5) +
geom_text(size = 20, hjust = 0, label = "hjust = 0", colour = "blue", alpha = 0.5) +
coord_equal()
ggplot(d[1,], aes(x, y)) +
geom_col(width = 1, just = 1, fill = "red", alpha = 0.5) +
geom_col(width = 1, just = 0, fill = "blue", alpha = 0.5) +
coord_equal()
Created on 2022-10-28 with reprex v2.0.2
Currently the alignment of columns is always centre, which may not always be desired. E.g. in the following case, values of
date
always give the first of the month, but are used to indicate the whole month (as is fairly common practice):In this case an
align
argument togeom_col()
would be really useful to align the columns with the first of each month.align
could accept values"centre"
(the default),"right"
and"left"
, which would be the option used here. The current alternatives are to useposition = position_nudge()
, which is fairly esoteric for such a simple task (and wouldn't always work that well, e.g. since February only has 28 days), or to instead usegeom_rect()
, which again seems much too complex for such a simple task.If you agree that this sounds like a useful feature I'd be happy to submit a PR.
As always, thanks for the hard work on this beautiful package!
(N.B, this example is a bit contrived due to the use of
scale_x_date()
but it's the simplest example I could think of)