tidyverse / funs

Collection of low-level functions for working with vctrs
Other
34 stars 7 forks source link

na.rm argument in lead and lag? #53

Open ValdarT opened 4 years ago

ValdarT commented 4 years ago

It sometimes happens that one wants the next/previous non-missing value. It would be nice if this was supported in lead/lag functions, e.g., through na.rm argument. This is also specified in SQL standard with IGNORE NULLS option and could thus also be implemented for dbplyr (for dialects that support it) and sparklyr (though Spark currently does not support it directly, only through first/last with an appropriate range clause, afaik).

DavisVaughan commented 2 years ago

Is this really that widely implemented in SQL though? MS SQL, SQLite, and MariaDB all don't implement it. Snowflake does: https://docs.snowflake.com/en/sql-reference/functions/lead.html

But I feel like you can accomplish this pretty easily by first filling the column with missing values up/down depending on whether you are doing lead/lag. Which you probably want to do anyways if you are going to end up comparing the column with its lagged version, right?

library(dplyr)
library(vctrs)

df <- tibble(x = c(5, 4, NA, 2, NA, NA, 6))
df
#> # A tibble: 7 × 1
#>       x
#>   <dbl>
#> 1     5
#> 2     4
#> 3    NA
#> 4     2
#> 5    NA
#> 6    NA
#> 7     6

df %>%
  mutate(x_fill = vec_fill_missing(x, "up")) %>%
  mutate(x_lag = lead(x_fill, 1))
#> # A tibble: 7 × 3
#>       x x_fill x_lag
#>   <dbl>  <dbl> <dbl>
#> 1     5      5     4
#> 2     4      4     2
#> 3    NA      2     2
#> 4     2      2     6
#> 5    NA      6     6
#> 6    NA      6     6
#> 7     6      6    NA

Created on 2022-07-16 by the reprex package (v2.0.1)