tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.52k stars 2.03k forks source link

Should `stroke` in `geom_point()` become `linewidth`? #5090

Closed teunbrand closed 1 year ago

teunbrand commented 1 year ago

This is more of a discussion point rather than a feature request or a true issue.

In the latest CRAN version, many size aesthetics were divorced, wherein size lives on to roughly mean area, whereas linewidth has been introduced to control the width of lines.

In geom_point(), the stroke aesthetic has always served the function of the new linewidth aesthetic, separate from size. Now that linewidth is the norm in many geoms, the stroke aesthetic has become the odd-one-out.

From a consistency point of view, it would make sense to rename stroke to linewidth. However, I do understand the pain of reworking yet another geom and appreciate that quite a body of existing code depends on the stroke aesthetic.

thomasp85 commented 1 year ago

It was considered and I decided to kick it down the road... I thing my main gripe is that I don't think you'd often want to scale the stroke around points together with other (more regular) linewidths...

I agree that conceptually they should be the same but the plan is to wait for at least the next larger release...

banbh commented 1 year ago

A perhaps related question is what the correspondence is between size and stroke in geom_point. For example:

ggplot(mapping = aes(x = 0, y = 0)) + 
  geom_point(shape = 'a', size = 100, alpha = .5, color = 'red') +
  geom_point(shape = 'a', stroke = 100, alpha = .5, color = 'blue') # stroke=150 seems to be same as size=100 in ggplot2_3.3.6

In ggplot2_3.3.6 the two "a"s have different sizes (although they become the same if stroke is increased to 150).

clauswilke commented 1 year ago

I would say let's not rename stroke just for the sake of renaming. There was a fundamental reason to rename size for line-based geoms, to specifically untangle line thickness from point size. I don't think stroke has this dual meaning and therefore there is no gain in functionality from the renaming, it's just style.

@banbh I think this is simply a reflection of the fact that both size and stroke enter the font-size choice: https://github.com/tidyverse/ggplot2/blob/08832e790953be19eb1f77e7c0b4f9b3f4dc2a0e/R/geom-point.r#L136

If you feel there's something wrong with this code please file a separate issue, though honestly this has been used for such a long time that I don't think there's room (or even a good argument) for change.

teunbrand commented 1 year ago

Yeah I can see your points. How about making linewidth optional then? Have the defaults be stroke = 0.5 and linewidth = NULL, and then use data$stroke <- data$linewidth %||% data$stroke. It might be somewhat confusing as it is not a common pattern in vanilla ggplot2 AFAIK. It does exist in extensions tough, for example ggtext::geom_richtext() does this with text.colour and colour. In geom_point()'s case, it preserves backward compatability, make it more consistent, and scaling is on opt-in basis.

clauswilke commented 1 year ago

@teunbrand I have considered this, but this will change the appearance of points if you have points and lines in your graph and have set the linewidth for the lines. I think this may create more issues than it solves.

teunbrand commented 1 year ago

I'm not sure how that would happen between layers, but for geom_pointrange() and perhaps geom_sf() that is probably true. I'll have to think about it and if I find a less invasive way to make it optional, I'll draft a PR for consideration.

teunbrand commented 1 year ago

So I've gotten this to work with no averse side-effects by doing the opposite of what I proposed earlier, i.e. have the defaults be stroke = NULL and linewidth = 0.5 and then using data$stroke <- data$stroke %||% data$linewidth.

However, while figuring this out, I've reconsidered my position on the issue. Adding a linewidth aesthetic to geom_point() would improve the consistency of setting line widths, but also would make it more inconsistent with other geoms that contain points, such geom_sf() and geom_pointrange(). I don't think that the trade-off would be worth it, so I'll close this issue, but I can prep a PR if someone else feels strongly about this.