tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.5k stars 2.02k forks source link

Should limits be enforced on the calculated geoms? #2887

Closed yutannihilation closed 5 years ago

yutannihilation commented 6 years ago

(This is originally reported on #2879, and I asked on RStudio Community. But, I couldn't find any reasonable answer so far...)

When the x axis is continuous and has limits, geom_bar() and geom_col() won't automatically consider the width of bars, which means the bars at both ends disappear even if the values are within the range of the limits. I feel this behaviour is counter-intuitive, whereas this won't happen with discrete scales.

library(ggplot2)

d <- data.frame(x = 1:3)

# bars at both ends disappear
ggplot(d) +
  geom_bar(aes(x)) +
  scale_x_continuous(limit = c(1, 3))
#> Warning: Removed 2 rows containing missing values (geom_bar).

# no bars disappear
ggplot(d) +
  geom_bar(aes(factor(x))) +
  scale_x_discrete(limit = c("1", "2", "3"))

Created on 2018-09-10 by the reprex package (v0.2.0).

Though I don't come up with how to control whether or not to ignore limits, IIUC about the semantics, limits should not be enforced on the computed geoms, but on the values before calculation.

After some investigation, I'm wondering if it's possible to reset limits as long as range of the scale after computing the widths of the geoms (compute_geom_1()) and computing positions (compute_position()). More preciesely, reset_scales() here:

https://github.com/tidyverse/ggplot2/blob/f636f34597e77e9a2a99783aff6233b838490ff3/R/plot-build.r#L68-L78

Though I don't come up with how to implement this yet, does my explanation make sense?

hadley commented 6 years ago

I'm pretty sure it worked that way in the past, and it was a deliberate choice to change it. Regardless, I don't think it's possible to make this sort of very large change to how ggplot2 computation works with out much thought and study, and I doubt the payoff is worth it for this one case.

yutannihilation commented 6 years ago

Thanks, I didn't know the history. I see. I don't want to brave a danger to change the current behaviour, but just want to see if it's possible to add some optional processing (though it seems very difficult). Sorry for filing this issue before enough study...

But, one thing, I moderately disagree that this is only for this one case. While I agree that this is not so common issue, I suspect it might be not too rare when a geom or position calculation goes beyond the limits.

I'm going to close this soon, but let me keep this open for a while until I find the history...

batpigandme commented 6 years ago

I suspect it might be not too rare when a geom or position calculation goes beyond the limits.

I think that's the reason for the warning.

yutannihilation commented 6 years ago

Hmm, but, I don't think it's enough because the user don't know how not to let the bars get removed. I don't complain about that there's no hint in the current warning message, but about that there's no general way to know priorly how much margins to add to ensure the geoms are included...

batpigandme commented 6 years ago

there's no general way to know priorly how much margins to add to ensure the geoms are included...

Other than not specifying the limits manually, you mean?

library(ggplot2)

d <- data.frame(x = 1:3)

ggplot(d) +
  geom_bar(aes(x)) +
  scale_x_continuous()

Created on 2018-09-10 by the reprex package (v0.2.0.9000).

yutannihilation commented 6 years ago

Thanks, in this case, I can just remove limits argument since it actually doesn't make sense to set the limit c(1, 3) whereas the data's range is c(1, 3). But, it's not possible when limits actually filters the data. I have to dplyr::filter() beforehand to limit the data. (this is what I got suggested in RStudio Community)

clauswilke commented 6 years ago

If we set aside for a moment whether it is feasible or advisable to fix this, I think we should agree that there is an issue. The behavior is rather counter-intuitive. See examples below.

library(ggplot2)

d <- data.frame(x = 1:3)

# expanding scale doesn't help
ggplot(d) +
  geom_bar(aes(x)) +
  scale_x_continuous(limits = c(1, 3), expand = c(0, 1))
#> Warning: Removed 2 rows containing missing values (geom_bar).


# making the limits somewhat bigger doesn't help
ggplot(d) +
  geom_bar(aes(x)) +
  scale_x_continuous(limits = c(0.6, 3.4))
#> Warning: Removed 2 rows containing missing values (geom_bar).


# it only works if I make the limits wide enough so the
# unexpanded limits contain the entire bar widths
ggplot(d) +
  geom_bar(aes(x)) +
  scale_x_continuous(limits = c(0.5, 3.5))

Created on 2018-09-10 by the reprex package (v0.2.0).

batpigandme commented 6 years ago

The behavior is rather counter-intuitive.

For sure! I've been following the community thread and the previous issue. The units aren't absolute, though, correct? I ask because, in lieu of actually fixing it (which doesn't seem near-term), we could try to better document the behaviour and/or give a more helpful error message.

hadley commented 6 years ago

My gut feeling is that it is counter-intutive for this example, but if you change the behaviour it will just be counter-intuitve for a different case, so the net change would not be uniformly positive. My very vague recollection is this is something to do with ylim() correctly truncating the standard error ribbon of geom_smooth().

I think Chesteron's fence applies here; until we know why the code works this way, we shouldn't consider changing it.

clauswilke commented 6 years ago

@hadley Sure. Let's just document behavior for now. Another example follows, using geom_smooth().

In terms of potential fixes, one possibility could be via opt-in: Geoms could be given the option to declare aesthetics for which scale limits should not be enforced, so that for those specific aesthetics scale limits would behave similarly to coord limits. Then, at least the interaction of calculated aesthetics with scale expansion would appear to be more predictable.

library(ggplot2)

# complete confidence band is shown
ggplot(mtcars, aes(hp, mpg)) + 
  geom_smooth(level = 0.99)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'


# part of the confidence band is missing, even though
# there is sufficient space to show it
ggplot(mtcars, aes(hp, mpg)) + 
  geom_smooth(level = 0.99) +
  scale_y_continuous(limits = c(10.4, 33.9), expand = c(0, 10))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Created on 2018-09-10 by the reprex package (v0.2.0).

clauswilke commented 6 years ago

Here is a reason for why the code may currently work the way it does: Internally modifying the limits that have been set explicitly by the user would be a bad design choice. It would inevitably lead to unpredictable behavior where it's entirely unclear how the scale range was obtained. If I provide explicit limits (and explicit expansion values), I should be able to rely on getting a scale with a defined range.

So any fix has to be about not removing computed aesthetics that still would fit into the final (expanded) scale range, not about changing the internal limits.

yutannihilation commented 6 years ago

So any fix has to be about not removing computed aesthetics that still would fit into the final (expanded) scale range, not about changing the internal limits.

Thanks, agreed. Apparently, my proposed fix was too naive and thoughtless, sorry... 🙄

thomasp85 commented 5 years ago

Can we agree that a fix for this will be a better documentation of the limits argument with a clear statement that setting limits on positional scales can remove data, sometimes unpredictably, and if the purpose is to zoom, one should use the limit argument in the coordinate system?

hadley commented 5 years ago

Yes, agreed. This would be a good TDD activity.

thomasp85 commented 5 years ago

Great - I’ll remove it from the milestone then

yutannihilation commented 5 years ago

I don't agree it's the "fix" (at least before knowing the technical details why this is wontfix), but I do agree we can close this issue by making documentation better because I feel I can never come up with the real fix to this problem... Thanks for the suggestion!

paleolimbot commented 5 years ago

I've solved this problem from the user end a few times by setting the oob on the scale:

library(ggplot2)

d <- data.frame(x = 1:3)
ggplot(d) +
  geom_bar(aes(x)) +
  scale_x_continuous(limit = c(1, 3), oob = function(x, limits) x)

Created on 2019-05-27 by the reprex package (v0.2.1)

lock[bot] commented 4 years ago

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/