Option to coalesce by column with data frames?

tidyverse / funs

Collection of low-level functions for working with vctrs

Other

34 stars 7 forks source link

Using the vctrs definition of a "missing row" being a missing value for data frames, coalesce() might not do what you expect. Here, only the row with all missing values is updated. It might be nice to have a way to update each column separately.

You could map2() over the data frames, but that would require that you'd already casted them to the same data frame type, and I don't think it generalizes that nicely to >2 data frames

It is possible that we need an idea of vec_coalesce() and df_coalesce() for this new case

# devtools::install_github("r-lib/funs")

library(funs)

df1 <- data.frame(x = c(NA, 1, NA), y = c(1, NA, NA))
df2 <- data.frame(x = c(2, 2, 2), y = c(2, 2, 2))

df1
#>    x  y
#> 1 NA  1
#> 2  1 NA
#> 3 NA NA

coalesce(df1, df2)
#>    x  y
#> 1 NA  1
#> 2  1 NA
#> 3  2  2

^{Created on 2020-04-24 by the reprex package (v0.3.0)}

Inspired by https://github.com/tidyverse/dplyr/pull/5142/files#diff-3680f0191de36a0e61d4b24cdb1ab150R149

rows_patch.data.frame <- function(x, y, by = NULL, ..., copy = FALSE, inplace = NULL) {
  y <- auto_copy(x, y, copy = copy)
  y_key <- df_key(y, by)
  x_key <- df_key(x, names(y_key))
  df_inplace(inplace)

  idx <- vctrs::vec_match(y[y_key], x[x_key])
  # FIXME: Check key in x? https://github.com/r-lib/vctrs/issues/1032

  # FIXME: Do we need vec_coalesce()
  new_data <- map2(x[idx, names(y)], y, coalesce)

  x[idx, names(y)] <- new_data
  x
}

df1 <- data.frame(x = c(NA, 1, NA), y = c(1, NA, NA)) df2 <- data.frame(x = c(2, 2, 2), y = c(2, 2, 2)) dplyr::coalesce(df1, df2) #> x y #> 1 2 1 #> 2 1 2 #> 3 2 2 funs::coalesce(df1, df2) #> x y #> 1 NA 1 #> 2 1 NA #> 3 2 2

df1 <- data.frame(x = c(NA, 1, NA), y = c(NA, NA, NA)) df2 <- data.frame(x = c(2, 2, 2), y = c(2, 2, 2)) df3 <- data.frame(x = c(NA, 3, 3), y = c(3, 3, 3)) # Only fully missing rows are coalesced funs::coalesce(df1, df2) #> x y #> 1 2 2 #> 2 1 NA #> 3 2 2 # But we allow partially missing coalescence funs::coalesce(df1, df3) #> x y #> 1 NA 3 #> 2 1 NA #> 3 3 3 # Once partially filled out, no more coalescence is possible funs::coalesce(df1, df3, df2) #> x y #> 1 NA 3 #> 2 1 NA #> 3 3 3

tidyverse / funs

Option to coalesce by column with data frames? #48