tidyverts / tsibble

Tidy Temporal Data Frames and Tools
https://tsibble.tidyverts.org
GNU General Public License v3.0
530 stars 49 forks source link

FR: optional .interval argument to `fill_gaps` #302

Open warnes opened 1 year ago

warnes commented 1 year ago

I am joining multiple time series values collected on different intervals, ranging from months to years. Consequently, I need to harmonize the intervals to perform the join.

At the moment, I don't see a documented method for setting the desired interval, either directly, or when calling fill_gaps.

StackOverflow shows a mechanism for overriding the interval by explicitly changing the object attribute (see https://stackoverflow.com/a/75981369), but I prefer to use documented interfaces whenever possible.

For my current code, it would be very helpful to have an additional optional .interval argument to fill_gaps that performs this step.

Perhaps something like these:

set_interval<-function(object, ...)
{
  attr(object, 'interval') <- new_interval(...)
  object
}

fill_gaps_interval <- function(.data, ..., .full = FALSE, .start = NULL, .end = NULL, .interval=NULL) 
{
  if(!is.null(.interval))
  { 
    .interval <- as.list(.interval)
    .interval$object <- .data
    .data <- do.call(set_interval, .interval)
  }

  call <- match.call()
  call$.data <- .data
  call$.interval <- NULL
  call[[1L]] <- quote(tsibble::fill_gaps)
  eval(call, parent.frame())
}

Reproducable Example:

> library(tidyverse)
> library(tsibble)

> df1 <- tsibble(quarter = yearquarter(as_date(c('2020-1-1','2021-1-1','2022-3-1'))),
+                   amount = c(5, 2, 1))
Using `quarter` as index variable.

> df2 <- tsibble(quarter = yearquarter(as_date(c('2022-1-1','2022-4-1','2022-7-1'))),
+                   amount = c(5, 2, 1))
Using `quarter` as index variable.

> ###
> # Existing functionality
> ###
> 
> interval(df1)
<interval[1]>
[1] 4Q

> # --> Fills 4Q interval
> df1 %>% fill_gaps(.start=yearquarter('2020-01-01'), .end=yearquarter('2023-01-01'))
# A tsibble: 4 x 2 [4Q]
  quarter amount
    <qtr>  <dbl>
1 2020 Q1      5
2 2021 Q1      2
3 2022 Q1      1
4 2023 Q1     NA

> # --> Fills 1Q interval
> interval(df2)
<interval[1]>
[1] 1Q

> df2 %>% fill_gaps(.start=yearquarter('2020-01-01'), .end=yearquarter('2023-01-01'))
# A tsibble: 13 x 2 [1Q]
   quarter amount
     <qtr>  <dbl>
 1 2020 Q1     NA
 2 2020 Q2     NA
 3 2020 Q3     NA
 4 2020 Q4     NA
 5 2021 Q1     NA
 6 2021 Q2     NA
 7 2021 Q3     NA
 8 2021 Q4     NA
 9 2022 Q1      5
10 2022 Q2      2
11 2022 Q3      1
12 2022 Q4     NA
13 2023 Q1     NA

> ###
> # Desired functionality: Fill to individual quarter 
> ##
> df1 %>% fill_gaps_interval(.start=yearquarter('2020-01-01'), .end=yearquarter('2023-01-01'), .interval=c(quarter=1))
# A tsibble: 13 x 2 [1Q]
   quarter amount
     <qtr>  <dbl>
 1 2020 Q1      5
 2 2020 Q2     NA
 3 2020 Q3     NA
 4 2020 Q4     NA
 5 2021 Q1      2
 6 2021 Q2     NA
 7 2021 Q3     NA
 8 2021 Q4     NA
 9 2022 Q1      1
10 2022 Q2     NA
11 2022 Q3     NA
12 2022 Q4     NA
13 2023 Q1     NA

> df2 %>% fill_gaps_interval(.start=yearquarter('2020-01-01'), .end=yearquarter('2023-01-01'), .interval=c(quarter=1))
# A tsibble: 13 x 2 [1Q]
   quarter amount
     <qtr>  <dbl>
 1 2020 Q1     NA
 2 2020 Q2     NA
 3 2020 Q3     NA
 4 2020 Q4     NA
 5 2021 Q1     NA
 6 2021 Q2     NA
 7 2021 Q3     NA
 8 2021 Q4     NA
 9 2022 Q1      5
10 2022 Q2      2
11 2022 Q3      1
12 2022 Q4     NA
13 2023 Q1     NA