r-lib / slider

Sliding Window Functions
https://slider.r-lib.org
Other
295 stars 12 forks source link

Unexpected behavior in `slide_period` / `slide_index` with `.complete=TRUE` #205

Open dshemetov opened 2 weeks ago

dshemetov commented 2 weeks ago

The documentation says of the .complete argument

.complete [logical(1)]

Should the function be evaluated on complete windows only? If FALSE, the default, then partial computations will be allowed.

However, in practice, partial computations not allowed are only on boundary of the data, not within

df <- tibble(
  time_value = c(
    seq.Date(as.Date("2020-01-01"), by = "day", length.out = 4),
    seq.Date(as.Date("2020-01-06"), by = "day", length.out = 16)
  ),
  x = 1:20
)
tibble(
  df$time_value,
  df$x,
  slide = slider::slide_period_int(
    df$x, df$time_value, "day", ~ sum(.x),
    .before = 1, .complete = TRUE
  )
)
# A tibble: 20 × 3
   `df$time_value` `df$x` slide
   <date>           <int> <int>
 1 2020-01-01           1    NA
 2 2020-01-02           2     3
 3 2020-01-03           3     5
 4 2020-01-04           4     7
 5 2020-01-06           5     5      <- I expect this to be NA
 6 2020-01-07           6    11
 7 2020-01-08           7    13
 8 2020-01-09           8    15
 9 2020-01-10           9    17
10 2020-01-11          10    19
11 2020-01-12          11    21
12 2020-01-13          12    23
13 2020-01-14          13    25
14 2020-01-15          14    27
15 2020-01-16          15    29
16 2020-01-17          16    31
17 2020-01-18          17    33
18 2020-01-19          18    35
19 2020-01-20          19    37
20 2020-01-21          20    39

> tibble(
  df$time_value,
  df$x,
  slide = slider::slide_index_int(
    df$x, df$time_value, ~ sum(.x),
    .before = 1, .complete = TRUE
  )
)
# A tibble: 20 × 3
   `df$time_value` `df$x` slide
   <date>           <int> <int>
 1 2020-01-01           1    NA
 2 2020-01-02           2     3
 3 2020-01-03           3     5
 4 2020-01-04           4     7
 5 2020-01-06           5     5      <- I expect this to be NA
 6 2020-01-07           6    11
 7 2020-01-08           7    13
 8 2020-01-09           8    15
 9 2020-01-10           9    17
10 2020-01-11          10    19
11 2020-01-12          11    21
12 2020-01-13          12    23
13 2020-01-14          13    25
14 2020-01-15          14    27
15 2020-01-16          15    29
16 2020-01-17          16    31
17 2020-01-18          17    33
18 2020-01-19          18    35
19 2020-01-20          19    37
20 2020-01-21          20    39

This applies for the whole family of typed slide_period and slide_index functions.

dshemetov commented 2 weeks ago

Related to #202.