Closed DavisVaughan closed 2 years ago
Also tackled in https://github.com/tidyverse/dplyr/pull/5334
df1 <- data.frame(x = c(NA, 1, NA), y = c(1, NA, NA))
df2 <- data.frame(x = c(2, 2, 2), y = c(2, 2, 2))
dplyr::coalesce(df1, df2)
#> x y
#> 1 2 1
#> 2 1 2
#> 3 2 2
funs::coalesce(df1, df2)
#> x y
#> 1 NA 1
#> 2 1 NA
#> 3 2 2
I'm tempted to generally offer a direction argument when semantics are useful across rows and across columns. But in this case, a potentially better way to tackle this is the "complete-cases" viewpoint. This might be more consistent. Currently the row-coalescence behaviour is a bit off because the target row must be completely missing, but the source row might not be:
df1 <- data.frame(x = c(NA, 1, NA), y = c(NA, NA, NA))
df2 <- data.frame(x = c(2, 2, 2), y = c(2, 2, 2))
df3 <- data.frame(x = c(NA, 3, 3), y = c(3, 3, 3))
# Only fully missing rows are coalesced
funs::coalesce(df1, df2)
#> x y
#> 1 2 2
#> 2 1 NA
#> 3 2 2
# But we allow partially missing coalescence
funs::coalesce(df1, df3)
#> x y
#> 1 NA 3
#> 2 1 NA
#> 3 3 3
# Once partially filled out, no more coalescence is possible
funs::coalesce(df1, df3, df2)
#> x y
#> 1 NA 3
#> 2 1 NA
#> 3 3 3
Davis will add a complete cases predicate to vctrs but how do we slice-coalesce the values? Maybe we need a binary vec_coalesce()
operation?
Using the vctrs definition of a "missing row" being a missing value for data frames,
coalesce()
might not do what you expect. Here, only the row with all missing values is updated. It might be nice to have a way to update each column separately.You could
map2()
over the data frames, but that would require that you'd already casted them to the same data frame type, and I don't think it generalizes that nicely to >2 data framesIt is possible that we need an idea of
vec_coalesce()
anddf_coalesce()
for this new caseCreated on 2020-04-24 by the reprex package (v0.3.0)
Inspired by https://github.com/tidyverse/dplyr/pull/5142/files#diff-3680f0191de36a0e61d4b24cdb1ab150R149