tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Add support for temporal hierarchies #59

Open mitchelloharawild opened 5 years ago

mitchelloharawild commented 5 years ago

Added in bca4bcd1a72174ffa75de152b5c46d09ddcc7eef

mitchelloharawild commented 4 years ago

Re-opening as this needs more work.

Temporal aggregations should maintain a flat data structure with new mixed time index class. This new index class should extend the tsibble interval output, displaying all intervals in the aggregation structure.

mitchelloharawild commented 4 years ago
library(tsibble)
library(fabletools)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tsibble(
  time = 2020,
  grp = rep(c("A", "B"), each = 4),
  year = "2020",
  semi = rep(c("1", "2"), 2, each = 2),
  qtr = rep(c("1", "2", "3", "4"), 2),
  index = time,
  key = c(grp, year, semi, qtr)
) %>% 
  aggregate_key(grp * (semi/qtr)) -> dt
print(dt, n = Inf)
#> # A tsibble: 21 x 4 [?]
#> # Key:       grp, semi, qtr [21]
#>     time grp          semi         qtr         
#>    <dbl> <chr>        <chr>        <chr>       
#>  1  2020 <aggregated> <aggregated> <aggregated>
#>  2  2020 A            <aggregated> <aggregated>
#>  3  2020 B            <aggregated> <aggregated>
#>  4  2020 <aggregated> 1            <aggregated>
#>  5  2020 <aggregated> 2            <aggregated>
#>  6  2020 <aggregated> 1            1           
#>  7  2020 <aggregated> 1            2           
#>  8  2020 <aggregated> 2            3           
#>  9  2020 <aggregated> 2            4           
#> 10  2020 A            1            <aggregated>
#> 11  2020 A            2            <aggregated>
#> 12  2020 B            1            <aggregated>
#> 13  2020 B            2            <aggregated>
#> 14  2020 A            1            1           
#> 15  2020 A            1            2           
#> 16  2020 A            2            3           
#> 17  2020 A            2            4           
#> 18  2020 B            1            1           
#> 19  2020 B            1            2           
#> 20  2020 B            2            3           
#> 21  2020 B            2            4

smat <- dt %>% 
  key_data() %>% 
  fabletools:::build_smat_rows()
print(smat)
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#>  [1,]    1    1    1    1    1    1    1    1
#>  [2,]    1    1    1    1    0    0    0    0
#>  [3,]    0    0    0    0    1    1    1    1
#>  [4,]    1    1    0    0    1    1    0    0
#>  [5,]    0    0    1    1    0    0    1    1
#>  [6,]    1    0    0    0    1    0    0    0
#>  [7,]    0    1    0    0    0    1    0    0
#>  [8,]    0    0    1    0    0    0    1    0
#>  [9,]    0    0    0    1    0    0    0    1
#> [10,]    1    1    0    0    0    0    0    0
#> [11,]    0    0    1    1    0    0    0    0
#> [12,]    0    0    0    0    1    1    0    0
#> [13,]    0    0    0    0    0    0    1    1
#> [14,]    1    0    0    0    0    0    0    0
#> [15,]    0    1    0    0    0    0    0    0
#> [16,]    0    0    1    0    0    0    0    0
#> [17,]    0    0    0    1    0    0    0    0
#> [18,]    0    0    0    0    1    0    0    0
#> [19,]    0    0    0    0    0    1    0    0
#> [20,]    0    0    0    0    0    0    1    0
#> [21,]    0    0    0    0    0    0    0    1

# Display smat matrix using sparse matrix and image()
library(Matrix)
image(Matrix(smat))

Created on 2020-08-07 by the reprex package (v0.3.0)

mitchelloharawild commented 4 years ago

Ideally the interface for this would look something like this:

tsibble(
  time = rep(yearquarter("2020 Q1") + 0:3, 2),
  grp = rep(c("A", "B"), each = 4),
  index = time,
  key = grp
) %>% 
  aggregate_key(grp) %>% 
  aggregate_index(list(year = year_function(), semi = semi_function())

Then the resulting dataset will have a mixed temporal granularity index (probably/definitely requires another package for representing time, like {distributional})

mitchelloharawild commented 3 years ago

The {moment} package (name pending) has been created to represent time vectors with mixed granularity. As such, the named list of aggregation functions is likely more than what is needed.

Something like this might be enough to work:

data %>%
  aggregate_index(tu_month(1,2,3,4,6,12))

Where tu_month specifies the "time unit" in months. Of course you would also be able to mix time units as follows if you prefer:

data %>%
  aggregate_index(c(tu_month(1,2,4,6), tu_quarter(1), tu_year(1)))