library(vctrs)
library(rlang)
ilag_ilead_impl <- function(x, order_by, n, default, fn) {
vec_assert(x)
vec_assert(order_by)
vec_assert(n, size = 1L)
n <- vec_cast(n, integer(), x_arg = "n")
x_size <- vec_size(x)
order_by_size <- vec_size(order_by)
if (x_size != order_by_size) {
abort("`x` and `order_by` must have the same size.")
}
# vec_any_na()! vctrs#544
if (any(vec_equal_na(order_by))) {
abort("`order_by` cannot have `NA` values.")
}
if (x_size == 0L) {
return(x)
}
order_by_shift <- fn(order_by, n)
loc <- vec_match(order_by_shift, order_by)
out <- vec_slice(x, loc)
if (!is.null(default)) {
na_loc <- vec_equal_na(loc)
default <- vec_cast(default, x, x_arg = "default", to_arg = "x")
vec_slice(out, na_loc) <- default
}
out
}
ilag <- function(x, order_by, n = 1L, default = NULL) {
ilag_ilead_impl(x, order_by, n, default, `-`)
}
ilead <- function(x, order_by, n = 1L, default = NULL) {
ilag_ilead_impl(x, order_by, n, default, `+`)
}
Usage:
library(dplyr)
df <- tibble(
x = c(5, 6, 7, 8),
i = as.Date("2019-01-01") + c(0, 1, 3, 4)
)
# Notice how the temporal spacing is respected
# We get an `NA` at 2019-01-04 because 2019-01-03 doesn't exist
df %>%
mutate(
x_lag = lag(x),
x_ilag = ilag(x, i)
)
#> # A tibble: 4 x 4
#> x i x_lag x_ilag
#> <dbl> <date> <dbl> <dbl>
#> 1 5 2019-01-01 NA NA
#> 2 6 2019-01-02 5 5
#> 3 7 2019-01-04 6 NA
#> 4 8 2019-01-05 7 7
# - lag()'s default doesn't respect ordering of any variable
# - lag(order_by) respects ordering but not spacing
# - ilag(order_by) respects ordering and spacing
df_rev <- arrange(df, desc(i))
df_rev %>%
mutate(
x_lag = lag(x),
x_lag_ob = lag(x, order_by = i),
x_ilag = ilag(x, i)
)
#> # A tibble: 4 x 5
#> x i x_lag x_lag_ob x_ilag
#> <dbl> <date> <dbl> <dbl> <dbl>
#> 1 8 2019-01-05 NA 7 7
#> 2 7 2019-01-04 8 6 NA
#> 3 6 2019-01-02 7 5 5
#> 4 5 2019-01-01 6 NA NA
One thought was to let lag() have a respect_spacing parameter, rather that creating a new function. But I think it needs to be a new function, because there are restrictions on the order_by of ilag() that require that it has to be integerish under the hood, which is not a restriction on lag(). Practically, if we had a respect_spacing parameter, a problem would show up with character order_by variables. It would be strange for the usage of respect_spacing to stop this from working:
lag(1:3, order_by = c("a", "b", "c"))
# [1] NA 1 2
lag(1:3, order_by = c("a", "b", "c"), respect_spacing = TRUE)
# Error in order_by - n : non-numeric argument to binary operator
CC @earowang for the original inspiration of the functions. I think you could keep keyed_lag(), which could call this internally. I was excited by your implementation, and thought that it could be useful outside of the tsibble / time series context as well.
Related to #34
These are variations on
lead()
andlag()
that require anorder_by
argument, but also respect the "spacing" betweenorder_by
observations.This is very useful for time series, and is a neat feature in Stata. See slides 10-13 https://www.princeton.edu/~otorres/TS101.pdf
Also think about
idiff()
Implementation:
Usage:
One thought was to let
lag()
have arespect_spacing
parameter, rather that creating a new function. But I think it needs to be a new function, because there are restrictions on theorder_by
ofilag()
that require that it has to be integerish under the hood, which is not a restriction onlag()
. Practically, if we had arespect_spacing
parameter, a problem would show up with characterorder_by
variables. It would be strange for the usage ofrespect_spacing
to stop this from working:CC @earowang for the original inspiration of the functions. I think you could keep
keyed_lag()
, which could call this internally. I was excited by your implementation, and thought that it could be useful outside of the tsibble / time series context as well.