r-lib / slider

Sliding Window Functions
https://slider.r-lib.org
Other
295 stars 12 forks source link

Equivalent of `slide_tsibble` and retaining partial windows #170

Closed cgoo4 closed 2 years ago

cgoo4 commented 2 years ago

I've used slider in a number of situations previously and love the package. I'm currently trying to see if it can do the equivalent of the slide_tsibble example below, ideally also retaining partial windows. I couldn't see a way to generate the overlapping windows each with an id. As slide_tsibble is being superseded, is it possible?

library(tidyverse)
library(clock)
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union

df <- tibble(group = c(rep("A", 8), rep("B", 7)),
              date = c(date_seq(as.Date("2022-01-01"), by = duration_days(1), total_size = 8),
                       date_seq(as.Date("2022-01-01"), by = duration_days(1), total_size = 7)),
              value = c(1:8, 1:7))

df |> 
  as_tsibble(index = date, key = group) |>
  slide_tsibble(.size = 4, .step = 2) |> # ideally also retaining partials
  arrange(group, .id, date)
#> # A tsibble: 20 x 4 [1D]
#> # Key:       .id, group [5]
#>    group date       value   .id
#>    <chr> <date>     <int> <int>
#>  1 A     2022-01-01     1     1
#>  2 A     2022-01-02     2     1
#>  3 A     2022-01-03     3     1
#>  4 A     2022-01-04     4     1
#>  5 A     2022-01-03     3     2
#>  6 A     2022-01-04     4     2
#>  7 A     2022-01-05     5     2
#>  8 A     2022-01-06     6     2
#>  9 A     2022-01-05     5     3
#> 10 A     2022-01-06     6     3
#> 11 A     2022-01-07     7     3
#> 12 A     2022-01-08     8     3
#> 13 B     2022-01-01     1     1
#> 14 B     2022-01-02     2     1
#> 15 B     2022-01-03     3     1
#> 16 B     2022-01-04     4     1
#> 17 B     2022-01-03     3     2
#> 18 B     2022-01-04     4     2
#> 19 B     2022-01-05     5     2
#> 20 B     2022-01-06     6     2

Created on 2022-05-17 by the reprex package (v2.0.1)

DavisVaughan commented 2 years ago

Here is one way. Rather than using slide_index() I decided to use slide() here since:

You can use .complete to either mimic slide_tsibble() or keep partial windows.

All in all I think this should get you pretty close.

library(tidyverse)
library(clock)
library(tsibble)
library(slider)

df <- tibble(group = c(rep("A", 8), rep("B", 7)),
             date = c(date_seq(as.Date("2022-01-01"), by = duration_days(1), total_size = 8),
                      date_seq(as.Date("2022-01-01"), by = duration_days(1), total_size = 7)),
             value = c(1:8, 1:7))

slide_like_tsibble <- function(data, complete) {
  out <- slide(
    .x = data,
    .f = identity, 
    .before = 3, 
    .step = 2,
    .complete = complete
  )

  bind_rows(!!!out, .id = ".id")
}

df |> 
  as_tsibble(index = date, key = group) |>
  slide_tsibble(.size = 4, .step = 2) |> # ideally also retaining partials
  arrange(group, .id, date)
#> # A tsibble: 20 x 4 [1D]
#> # Key:       .id, group [5]
#>    group date       value   .id
#>    <chr> <date>     <int> <int>
#>  1 A     2022-01-01     1     1
#>  2 A     2022-01-02     2     1
#>  3 A     2022-01-03     3     1
#>  4 A     2022-01-04     4     1
#>  5 A     2022-01-03     3     2
#>  6 A     2022-01-04     4     2
#>  7 A     2022-01-05     5     2
#>  8 A     2022-01-06     6     2
#>  9 A     2022-01-05     5     3
#> 10 A     2022-01-06     6     3
#> 11 A     2022-01-07     7     3
#> 12 A     2022-01-08     8     3
#> 13 B     2022-01-01     1     1
#> 14 B     2022-01-02     2     1
#> 15 B     2022-01-03     3     1
#> 16 B     2022-01-04     4     1
#> 17 B     2022-01-03     3     2
#> 18 B     2022-01-04     4     2
#> 19 B     2022-01-05     5     2
#> 20 B     2022-01-06     6     2

# Mimic tsibble
df %>%
  group_by(group) %>%
  summarise(
    slide_like_tsibble(cur_data(), complete = TRUE), 
    .groups = "drop"
  )
#> # A tibble: 20 × 4
#>    group .id   date       value
#>    <chr> <chr> <date>     <int>
#>  1 A     1     2022-01-01     1
#>  2 A     1     2022-01-02     2
#>  3 A     1     2022-01-03     3
#>  4 A     1     2022-01-04     4
#>  5 A     2     2022-01-03     3
#>  6 A     2     2022-01-04     4
#>  7 A     2     2022-01-05     5
#>  8 A     2     2022-01-06     6
#>  9 A     3     2022-01-05     5
#> 10 A     3     2022-01-06     6
#> 11 A     3     2022-01-07     7
#> 12 A     3     2022-01-08     8
#> 13 B     1     2022-01-01     1
#> 14 B     1     2022-01-02     2
#> 15 B     1     2022-01-03     3
#> 16 B     1     2022-01-04     4
#> 17 B     2     2022-01-03     3
#> 18 B     2     2022-01-04     4
#> 19 B     2     2022-01-05     5
#> 20 B     2     2022-01-06     6

# With partial groups
df %>%
  group_by(group) %>%
  summarise(
    slide_like_tsibble(cur_data(), complete = FALSE), 
    .groups = "drop"
  )
#> # A tibble: 24 × 4
#>    group .id   date       value
#>    <chr> <chr> <date>     <int>
#>  1 A     1     2022-01-01     1
#>  2 A     2     2022-01-01     1
#>  3 A     2     2022-01-02     2
#>  4 A     2     2022-01-03     3
#>  5 A     3     2022-01-02     2
#>  6 A     3     2022-01-03     3
#>  7 A     3     2022-01-04     4
#>  8 A     3     2022-01-05     5
#>  9 A     4     2022-01-04     4
#> 10 A     4     2022-01-05     5
#> # … with 14 more rows

Created on 2022-05-18 by the reprex package (v2.0.1)

cgoo4 commented 2 years ago

Really appreciate your time looking at this. You've got some very useful packages! My example was a little over-simplified and I will need to handle gaps for weekends/holidays (as well as many variables). I'll have a play with it and should get close enough. Thank you.

DavisVaughan commented 2 years ago

Feel free to open a new issue with a more realistic example if you still have issues. Thanks for the nice words! I'll close this for now.