tidyverts / tsibble

Tidy Temporal Data Frames and Tools
https://tsibble.tidyverts.org
GNU General Public License v3.0
530 stars 49 forks source link

Joining by yearweek with week_start != 1 produces incorrect results #299

Closed jrauser closed 8 months ago

jrauser commented 1 year ago

The code below makes two tibbles with a yearweek that runs from 2023 W23 to W28. But when joined, the yearweek runs from W22 to W27. If week_start is set to 1, the result is as expected.

library(tidyverse)
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following object is masked from 'package:lubridate':
#> 
#>     interval
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
x<-tibble(wk=make_yearweek(2023, 23:28, week_start=7), x=23:28)
y<-tibble(wk=make_yearweek(2023, 23:28, week_start=7), y=123:128)
x
#> # A tibble: 6 × 2
#>         wk     x
#>     <week> <int>
#> 1 2023 W23    23
#> 2 2023 W24    24
#> 3 2023 W25    25
#> 4 2023 W26    26
#> 5 2023 W27    27
#> 6 2023 W28    28
y
#> # A tibble: 6 × 2
#>         wk     y
#>     <week> <int>
#> 1 2023 W23   123
#> 2 2023 W24   124
#> 3 2023 W25   125
#> 4 2023 W26   126
#> 5 2023 W27   127
#> 6 2023 W28   128
inner_join(x, y, by="wk")
#> # A tibble: 6 × 3
#>         wk     x     y
#>     <week> <int> <int>
#> 1 2023 W22    23   123
#> 2 2023 W23    24   124
#> 3 2023 W24    25   125
#> 4 2023 W25    26   126
#> 5 2023 W26    27   127
#> 6 2023 W27    28   128
jrauser commented 1 year ago

Possibly related? https://github.com/tidyverts/fable/issues/397

jrauser commented 1 year ago

I think the problem is at L228 of yearweek.R. When we call new_yearweek() we need to pass week_start=week_start(x).

jrauser commented 1 year ago

Here's a workaround:

library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
library(tidyverse)

x<-tibble(wk=make_yearweek(2023, 23:28, week_start=7), x=23:28)
y<-tibble(wk=make_yearweek(2023, 23:28, week_start=7), y=123:128)
j1 <- inner_join(x, y, by="wk")
j1
#> # A tibble: 6 × 3
#>         wk     x     y
#>     <week> <int> <int>
#> 1 2023 W22    23   123
#> 2 2023 W23    24   124
#> 3 2023 W24    25   125
#> 4 2023 W25    26   126
#> 5 2023 W26    27   127
#> 6 2023 W27    28   128
attr(j1$wk, "week_start")
#> [1] 1

vec_ptype2.yearweek.yearweek <- function(x, y, ...) {
  if (attr(x, "week_start") != attr(y, "week_start")) {
    abort("Can't combine <yearweek> with different `week_start`.")
  }
  yearweek(NULL, week_start=attr(x, "week_start"))
}

j2 <- inner_join(x, y, by="wk")
j2
#> # A tibble: 6 × 3
#>         wk     x     y
#>     <week> <int> <int>
#> 1 2023 W23    23   123
#> 2 2023 W24    24   124
#> 3 2023 W25    25   125
#> 4 2023 W26    26   126
#> 5 2023 W27    27   127
#> 6 2023 W28    28   128
attr(j2$wk, "week_start")
#> [1] 7

Created on 2023-09-11 with reprex v2.0.2

jrauser commented 1 year ago

This same problem exists with fiscal_start in yearquater.

earowang commented 8 months ago

thanks for reporting with a reproducible example. fixed now.