r-lib / slider

Sliding Window Functions
https://slider.r-lib.org
Other
295 stars 12 forks source link

implement equivalent to zoo's `fill = "extend"`? #160

Closed japhir closed 2 years ago

japhir commented 2 years ago

Currently, we can do the same as zoo's fill = NA like so:

zoo::rollmean(1:10, k = 9, align = "center", fill = NA)
#>  [1] NA NA NA NA  5  6 NA NA NA NA
slider::slide_dbl(1:10, mean, .before = 4, .after = 4, .complete = TRUE)
#>  [1] NA NA NA NA  5  6 NA NA NA NA

is there a way to implement interpolation/extension for those values that do not have enough for complete computation?

# if it's at either end, just repeat the final value
zoo::rollmean(1:10, k = 9, align = "center", fill = "extend")
#>  [1] 5 5 5 5 5 6 6 6 6 6
# if it's in the middle, do linear interpolation
zoo::rollmean(c(1:6, NA, 8:15), k = 5, align = "center", fill = "extend")
#> [1]  3  3  3  4  5  6  7  8  9 10 11 12 13 13 13
# in comparison to what's currently possible in slider:
slider::slide_dbl(c(1:6, NA, 8:15), mean, .before = 2, .after = 2, .complete = TRUE)
#> [1] NA NA  3  4 NA NA NA NA NA 10 11 12 13 NA NA
# or without complete
slider::slide_dbl(c(1:6, NA, 8:15), mean, .before = 2, .after = 2, .complete = FALSE)
#> [1]  2.0  2.5  3.0  4.0   NA   NA   NA   NA   NA 10.0 11.0 12.0 13.0 13.5 14.0

This would allow me to remove the zoo dependency from my workflow ;-). Kind regards, Ilja

DavisVaughan commented 2 years ago

Rather than adding more to slider, I'd encourage you to use approxfun() to do the interpolation + vctrs::vec_fill_missing() to repeat the values on the ends.

I think that fill = "extend" won't apply in a lot of scenarios in slider because you don't always return a numeric vector that can be interpolated, and interpolation often isn't the right thing to do. So I feel somewhat strongly that I won't be adding it to slider. I appreciate the suggestion though!

You could wrap the following up into a function

library(slider)
library(vctrs)

x <- c(1:6, NA, 8:15)
index <- seq_along(x)

out <- slider::slide_dbl(x, mean, .before = 2, .after = 2, .complete = TRUE)
out
#>  [1] NA NA  3  4 NA NA NA NA NA 10 11 12 13 NA NA

missing <- is.na(out)

# This will leave the incomplete values on the ends as NA
# because they are seen as extrapolation, not interpolation
fn_interp <- approxfun(
  x = index,
  y = out,
  yleft = NA_real_,
  yright = NA_real_
)

out_interp <- fn_interp(index[missing])
out_interp
#> [1] NA NA  5  6  7  8  9 NA NA

out[missing] <- out_interp
out
#>  [1] NA NA  3  4  5  6  7  8  9 10 11 12 13 NA NA

out <- vec_fill_missing(out, direction = "downup")
out
#>  [1]  3  3  3  4  5  6  7  8  9 10 11 12 13 13 13

Created on 2021-09-07 by the reprex package (v2.0.0.9000)

japhir commented 2 years ago

Thank you so much for the elaborate response with an example of how to do the workaround! I can see how you want to keep the API as simple and clean as possible :)