tidyverts / tsibble

Tidy Temporal Data Frames and Tools
https://tsibble.tidyverts.org
GNU General Public License v3.0
528 stars 50 forks source link

scale_x_yearquarter odd behavior #219

Closed TylerGrantSmith closed 3 years ago

TylerGrantSmith commented 3 years ago

In the example below, it seems odd to me that having one less quarter causes the scale_x_yearquarter labels to shift and include the 1999 date. I figure it should behave similar to the scale_x_date variant provided after.

library(ggplot2)
library(tsibble)

#### With yearquarter ----

set.seed(42)
tibble(date = seq(yearquarter("2000Q1"), 
                  yearquarter("2020Q4"), by = 1),
       y = rnorm(length(date))) %>% 
  ggplot() + 
  aes(date, y) +
  geom_point() +
  scale_x_yearquarter(date_breaks = "5 years", 
                      date_minor_breaks = "1 year")


set.seed(42)
tibble(date = seq(yearquarter("2000Q2"), 
                  yearquarter("2020Q4"), by = 1),
       y = rnorm(length(date))) %>% 
  ggplot() + 
  aes(date, y) +
  geom_point() +
  scale_x_yearquarter(date_breaks = "5 years", 
                      date_minor_breaks = "1 year")


#### With dates ----
set.seed(42)
tibble(date = seq(as.Date("2000-01-01"), 
                  as.Date("2020-10-01"), by = "quarter"),
       y = rnorm(length(date))) %>% 
  ggplot() + 
  aes(date, y) +
  geom_point() +
  scale_x_date(date_breaks = "5 years",
               date_minor_breaks = "1 year")


set.seed(42)
tibble(date = seq(as.Date("2000-04-01"), 
                  as.Date("2020-10-01"), by = "quarter"),
       y = rnorm(length(date))) %>% 
  ggplot() + 
  aes(date, y) +
  geom_point() +
  scale_x_date(date_breaks = "5 years",
               date_minor_breaks = "1 year")

Created on 2020-09-24 by the reprex package (v0.3.0)

As a second question: is there a way to use the date_breaks = "5 years" and have it start on a different quarter besides Q1?

Thanks

earowang commented 3 years ago

It is a duplicate of #195 This issue is indeed odd, because this doesn't happen to all yearquarter but only a small fraction. It perhaps depends on data size and range, which I can't exactly locate. This also occurs to zoo::yearqtr ggplot2 scales.

I tend to believe it's a {ggplot2} issue, but haven't debugged through the code.

Re 2), you can manually construct using breaks, for example breaks by 2 years starting with Q2. library(ggplot2) library(tsibble)

#### With yearquarter ----

set.seed(42)
tibble(date = seq(yearquarter("2000Q1"), 
                  yearquarter("2020Q4"), by = 1),
       y = rnorm(length(date))) %>% 
  ggplot() + 
  aes(date, y) +
  geom_point() +
  scale_x_yearquarter(breaks = yearquarter("2000Q2") + seq(0, 82, by = 4 * 2))

Created on 2020-09-26 by the reprex package (v0.3.0)

TylerGrantSmith commented 3 years ago

@earowang I think I have located the issue. Here is the traceback of where the problem occurrs:

     x
  1. +-(function (x, ...) ...
  2. \-ggplot2:::print.ggplot(x)
  3.   +-ggplot2::ggplot_build(x)
  4.   \-ggplot2:::ggplot_build.ggplot(x)
  5.     \-layout$setup_panel_params()
  6.       \-ggplot2:::f(..., self = self)
  7.         \-base::Map(setup_panel_params, scales_x, scales_y)
  8.           \-base::mapply(FUN = f, ..., SIMPLIFY = FALSE)
  9.             \-(function (scale_x, scale_y) ...
 10.               \-self$coord$setup_panel_params(scale_x, scale_y, params = self$coord_params)
 11.                 \-ggplot2:::f(..., self = self)
 12.                   \-ggplot2:::view_scales_from_scale(scale_x, self$limits$x, self$expand)
 13.                     \-ggplot2:::view_scale_primary(scale, limits, continuous_range)
 14.                       \-scale$get_breaks(sort(continuous_range))
 15.                         \-ggplot2:::f(..., self = self)

In this function limits <- self$trans$inverse(limits) is called and then at the end breaks <- censor(breaks, self$trans$transform(limits), only.finite = FALSE) is called. However, the yearquarter transform isn't truly invertible.

In my example, after expansion the lower range of 2000-04-01 (11048) gets expanded to 10673.6 (1999-03-23).

> self$trans$inverse(10673.6)
<yearquarter[1]>
[1] "1999 Q1"
# Year starts on: January
> self$trans$transform(self$trans$inverse(10673.6))
[1] 10592

so the censor is comparing the breaks to these (expanded) expanded limits and the 1999 Q1 isn't censored. I think this can be fixed by supplying a custom get_breaks that skips this inversion/transform-back step. To be honest, I am not sure why ggplot2 does it in the first place.