Open ValdarT opened 4 years ago
Is this really that widely implemented in SQL though? MS SQL, SQLite, and MariaDB all don't implement it. Snowflake does: https://docs.snowflake.com/en/sql-reference/functions/lead.html
But I feel like you can accomplish this pretty easily by first filling the column with missing values up/down depending on whether you are doing lead/lag. Which you probably want to do anyways if you are going to end up comparing the column with its lagged version, right?
library(dplyr)
library(vctrs)
df <- tibble(x = c(5, 4, NA, 2, NA, NA, 6))
df
#> # A tibble: 7 × 1
#> x
#> <dbl>
#> 1 5
#> 2 4
#> 3 NA
#> 4 2
#> 5 NA
#> 6 NA
#> 7 6
df %>%
mutate(x_fill = vec_fill_missing(x, "up")) %>%
mutate(x_lag = lead(x_fill, 1))
#> # A tibble: 7 × 3
#> x x_fill x_lag
#> <dbl> <dbl> <dbl>
#> 1 5 5 4
#> 2 4 4 2
#> 3 NA 2 2
#> 4 2 2 6
#> 5 NA 6 6
#> 6 NA 6 6
#> 7 6 6 NA
Created on 2022-07-16 by the reprex package (v2.0.1)
It sometimes happens that one wants the next/previous non-missing value. It would be nice if this was supported in lead/lag functions, e.g., through
na.rm
argument. This is also specified in SQL standard withIGNORE NULLS
option and could thus also be implemented fordbplyr
(for dialects that support it) andsparklyr
(though Spark currently does not support it directly, only through first/last with an appropriate range clause, afaik).