r-lib / slider

Sliding Window Functions
https://slider.r-lib.org
Other
297 stars 12 forks source link

recovering the time index after or during reframe(slide_period) #206

Closed benzipperer closed 3 months ago

benzipperer commented 3 months ago

Hi, I love this package, thank you! This is not a bug but a question, and I wasn't sure where to ask.

How do we recover the time index when using slide_period on a dataframe inside reframe() ?

Let's say you have data with multiple observations per time period where you want to calculate rolling means and medians: rolling over time, maybe even rolling over time within groups. The data might be

library(tidyverse)
library(slider)

company = tibble(
  day = rep(seq(1:10), each = 10),
  sales = sample(100, 100),
  n_calls = sales + sample(1000, 100)
) |> 
  mutate(group = if_else(row_number() <= 5, 1, 0), .by = day) |> 
  mutate(date = ymd(paste0("2024-01-", day)))

And you have some function that operates on a data frame and returns a data frame

my_stats = function(data, var) {
  data |> 
  summarize(mean = mean({{var}}), median = median({{var}})) |> 
  pivot_longer(everything())
}

In this context is one good way forward is using slide_period() within a reframe(..., .by = group) statement? So something like

company |> 
    arrange(day) |> 
    reframe(
        results = slide_period(
            pick(everything()), 
            date, 
            "day", 
            ~ my_stats(.x, sales), 
            .before = 2, 
            .complete = TRUE
        ),
        .by = group
    ) 

gets you most of the way there. But then how do you easily recover the day index in the output (say with another column that is the day)? Is that what the ... and related namesto options were for in the slide functions, and if so, how do I use them?

Thanks again for your awesome package!

DavisVaughan commented 3 months ago

The index is in the data frame .x that you pass through to my_stats() so you can do something like this if you want to know the range if covers

library(tidyverse)
library(slider)

company = tibble(
  day = rep(seq(1:10), each = 10),
  sales = sample(100, 100),
  n_calls = sales + sample(1000, 100)
) |> 
  mutate(group = if_else(row_number() <= 5, 1, 0), .by = day) |> 
  mutate(date = ymd(paste0("2024-01-", day)))

my_stats = function(data, var) {
  data |> 
    summarize(
      min = min(date),
      max = max(date),
      mean = mean({{var}}), 
      median = median({{var}})
    )
}

company |> 
  arrange(group, day) |> 
  reframe(
      results = slide_period(
          pick(everything()), 
          date, 
          "day", 
          ~ my_stats(.x, sales), 
          .before = 2, 
          .complete = TRUE
      ),
      .by = group
  ) |>
  unnest(results)
#> # A tibble: 16 × 5
#>    group min        max         mean median
#>    <dbl> <date>     <date>     <dbl>  <int>
#>  1     0 2024-01-01 2024-01-03  36.5     29
#>  2     0 2024-01-02 2024-01-04  45.9     34
#>  3     0 2024-01-03 2024-01-05  52       53
#>  4     0 2024-01-04 2024-01-06  57.6     53
#>  5     0 2024-01-05 2024-01-07  56.7     53
#>  6     0 2024-01-06 2024-01-08  59.7     56
#>  7     0 2024-01-07 2024-01-09  53.1     56
#>  8     0 2024-01-08 2024-01-10  51       50
#>  9     1 2024-01-01 2024-01-03  57.5     62
#> 10     1 2024-01-02 2024-01-04  60.9     62
#> 11     1 2024-01-03 2024-01-05  51.1     48
#> 12     1 2024-01-04 2024-01-06  49.5     47
#> 13     1 2024-01-05 2024-01-07  51.5     55
#> 14     1 2024-01-06 2024-01-08  48.7     55
#> 15     1 2024-01-07 2024-01-09  46.7     51
#> 16     1 2024-01-08 2024-01-10  44.7     43