tidyverts / tsibble

Tidy Temporal Data Frames and Tools
https://tsibble.tidyverts.org
GNU General Public License v3.0
528 stars 50 forks source link

Allow specifying start and end for *_gaps #259

Closed Fuco1 closed 2 years ago

Fuco1 commented 3 years ago

Sometimes we need explicit zeroes where there are no observations. fill_gaps works fine except it only fills the gaps between the min and max of the series. If I have for example sales data from 1.1. to 31.12 but actual first sale of the year was in february, fill_gaps will not produce an entry for january.

I would like to be able to specify I want all the points in an explicit range and then the tsibble should be extended both ways.

There is a package for tibble/dataframe https://edwinth.github.io/padr/reference/pad.html which includes such a function. This is for now the only reason I depend on it, but I think this feature would fit tsibble perfectly.

earowang commented 3 years ago

Yea, wasn't sure if the feature is practically useful. If it is, the start/end arguments should take user specified input like as.Date("2021-01-01"). I will implement it for the next release.

arnaud-feldmann commented 2 years ago

@earowang Hi, and thanks for your work I posted on SO something related (but with an opposite need because the point is having NAs not zeros) : https://stackoverflow.com/questions/69193535/tsibble-adding-na-into-the-incomplete-low-frequency-values-using-index-by-and

library(tsibble)
library(dplyr)
example <- as_tsibble(ts(rep(1,10),frequency = 12,start=2010))

example %>%
  index_by(quarter = ~ yearquarter(.)) %>%
  summarize(value=sum(value))

# A tsibble: 4 x 2 [1Q]
#  quarter value
#    <qtr> <dbl>
#1 2010 Q1     3
#2 2010 Q2     3
#3 2010 Q3     3
#4 2010 Q4     1

I think there should be a strict way to aggregate, involving NAs on functions that don't remove NAs. Maybe some kind of fill_gap arg, i don't know.

if there was a way for fill_gap to floor/ceiling the starts and ends based on a compatible lower frequency, @Fuco1 would have a nice way to work without having to tell starts and ends manually.

arnaud-feldmann commented 2 years ago

@earowang Maybe I'm wrong but the simplest way to do it safely i can think about is something like that :

from_month_to_quarter <- function(x,f=sum) {

  f <- match.fun(f)
  index <- index(x)
  key <- key(x)

  x %>%
    index_by(quarter = ~ yearquarter(.)) %>%
    as_tibble() %>%
    mutate(mod=factor((month(!! index) - 1L) %% 3L+1L,
                      as.character(1:3))) %>%
    complete(quarter,mod,!!! key) %>%
    mutate(m=3L *(quarter(quarter)-1L)+as.integer(mod),
           y=year(quarter)) %>%
    mutate(index=yearmonth(as.Date(paste0(y,"-",m,"-01")))) %>%
    select(!c(m,y,mod,!!index)) %>%
    as_tsibble(index=index,key=c(!!! key)) %>%
    index_by(quarter = ~ yearquarter(.)) %>%
    group_by_key() %>%
    summarize(across(! index,f))

}

# A tsibble: 4 x 2 [1Q]
#  quarter value
#    <qtr> <dbl>
#1 2010 Q1     3
#2 2010 Q2     3
#3 2010 Q3     3
#4 2010 Q4    NA

which isn't very straightforward for something like that

earowang commented 2 years ago

@Fuco1 I've added .start and .end to *_gaps(). Can you please give it a try? and please let me know if there's any issue.

@arnaud-feldmann You should be able to do the following now:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
example <- as_tsibble(ts(rep(1,10),frequency = 12,start=2010))
example %>% 
  fill_gaps(.end = yearmonth("2010-Dec")) %>% 
  index_by(quarter = ~ yearquarter(.)) %>%
  summarize(value=sum(value))
#> # A tsibble: 4 x 2 [1Q]
#>   quarter value
#>     <qtr> <dbl>
#> 1 2010 Q1     3
#> 2 2010 Q2     3
#> 3 2010 Q3     3
#> 4 2010 Q4    NA

Created on 2021-09-27 by the reprex package (v2.0.1)

Fuco1 commented 2 years ago

Doing a bit of a cleanup of my GitHub subscribed issues, I can confirm this works. Thanks for the feature and the latest release, super cool! Apologies for a late confirmation.