moodymudskipper / safejoin

Wrappers around dplyr functions to join safely using various checks
GNU General Public License v3.0
42 stars 7 forks source link

conflicts edge case #42

Closed moodymudskipper closed 2 years ago

moodymudskipper commented 4 years ago
library(dplyr)
#> 
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(safejoin)

x <- band_instruments  %>% mutate(plays = c(NA,"bass","guitar"))
y <- band_members %>% mutate(plays = c(NA,"GUITAR","BASS")) %>%
  add_row(name = "Keith",plays = NA)

## These work

safe_left_join(x, y, by = c("name" = "plays"), check ="")
#> # A tibble: 3 x 4
#>   name  plays  name.y band 
#>   <chr> <chr>  <chr>  <chr>
#> 1 John  <NA>   <NA>   <NA> 
#> 2 Paul  bass   <NA>   <NA> 
#> 3 Keith guitar <NA>   <NA>
safe_left_join(x, y, by = c("name" = "plays"), conflict = ~.x)
#> # A tibble: 3 x 3
#>   name  plays  band 
#>   <chr> <chr>  <chr>
#> 1 John  <NA>   <NA> 
#> 2 Paul  bass   <NA> 
#> 3 Keith guitar <NA>

## These don't

safe_left_join(x, y, by = c("name" = "plays"), conflict = "patch")
#> Error: Can't subset columns that don't exist.
#> x Column `...plays_conflicted...` doesn't exist.
safe_left_join(x, y, by = c("name" = "plays"), conflict = coalesce)
#> Error: Input must be a vector, not NULL.

This seems to be because "plays" is viewed as a conflicted column, when it shouldn't, as the "plays" column from y was fused into "name" from x, so the remaining "plays" column is unambiguously from x.

moodymudskipper commented 2 years ago

The fun now happens at https://github.com/moodymudskipper/powerjoin