Open mitchelloharawild opened 5 years ago
Re-opening as this needs more work.
Temporal aggregations should maintain a flat data structure with new mixed time index class. This new index class should extend the tsibble interval output, displaying all intervals in the aggregation structure.
library(tsibble)
library(fabletools)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
tsibble(
time = 2020,
grp = rep(c("A", "B"), each = 4),
year = "2020",
semi = rep(c("1", "2"), 2, each = 2),
qtr = rep(c("1", "2", "3", "4"), 2),
index = time,
key = c(grp, year, semi, qtr)
) %>%
aggregate_key(grp * (semi/qtr)) -> dt
print(dt, n = Inf)
#> # A tsibble: 21 x 4 [?]
#> # Key: grp, semi, qtr [21]
#> time grp semi qtr
#> <dbl> <chr> <chr> <chr>
#> 1 2020 <aggregated> <aggregated> <aggregated>
#> 2 2020 A <aggregated> <aggregated>
#> 3 2020 B <aggregated> <aggregated>
#> 4 2020 <aggregated> 1 <aggregated>
#> 5 2020 <aggregated> 2 <aggregated>
#> 6 2020 <aggregated> 1 1
#> 7 2020 <aggregated> 1 2
#> 8 2020 <aggregated> 2 3
#> 9 2020 <aggregated> 2 4
#> 10 2020 A 1 <aggregated>
#> 11 2020 A 2 <aggregated>
#> 12 2020 B 1 <aggregated>
#> 13 2020 B 2 <aggregated>
#> 14 2020 A 1 1
#> 15 2020 A 1 2
#> 16 2020 A 2 3
#> 17 2020 A 2 4
#> 18 2020 B 1 1
#> 19 2020 B 1 2
#> 20 2020 B 2 3
#> 21 2020 B 2 4
smat <- dt %>%
key_data() %>%
fabletools:::build_smat_rows()
print(smat)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1 1 1 1 1 1 1 1
#> [2,] 1 1 1 1 0 0 0 0
#> [3,] 0 0 0 0 1 1 1 1
#> [4,] 1 1 0 0 1 1 0 0
#> [5,] 0 0 1 1 0 0 1 1
#> [6,] 1 0 0 0 1 0 0 0
#> [7,] 0 1 0 0 0 1 0 0
#> [8,] 0 0 1 0 0 0 1 0
#> [9,] 0 0 0 1 0 0 0 1
#> [10,] 1 1 0 0 0 0 0 0
#> [11,] 0 0 1 1 0 0 0 0
#> [12,] 0 0 0 0 1 1 0 0
#> [13,] 0 0 0 0 0 0 1 1
#> [14,] 1 0 0 0 0 0 0 0
#> [15,] 0 1 0 0 0 0 0 0
#> [16,] 0 0 1 0 0 0 0 0
#> [17,] 0 0 0 1 0 0 0 0
#> [18,] 0 0 0 0 1 0 0 0
#> [19,] 0 0 0 0 0 1 0 0
#> [20,] 0 0 0 0 0 0 1 0
#> [21,] 0 0 0 0 0 0 0 1
# Display smat matrix using sparse matrix and image()
library(Matrix)
image(Matrix(smat))
Created on 2020-08-07 by the reprex package (v0.3.0)
Ideally the interface for this would look something like this:
tsibble(
time = rep(yearquarter("2020 Q1") + 0:3, 2),
grp = rep(c("A", "B"), each = 4),
index = time,
key = grp
) %>%
aggregate_key(grp) %>%
aggregate_index(list(year = year_function(), semi = semi_function())
Then the resulting dataset will have a mixed temporal granularity index (probably/definitely requires another package for representing time, like {distributional})
The {moment} package (name pending) has been created to represent time vectors with mixed granularity. As such, the named list of aggregation functions is likely more than what is needed.
Something like this might be enough to work:
data %>%
aggregate_index(tu_month(1,2,3,4,6,12))
Where tu_month
specifies the "time unit" in months. Of course you would also be able to mix time units as follows if you prefer:
data %>%
aggregate_index(c(tu_month(1,2,4,6), tu_quarter(1), tu_year(1)))
Added in bca4bcd1a72174ffa75de152b5c46d09ddcc7eef