markfairbanks / tidytable

Tidy interface to 'data.table'
https://markfairbanks.github.io/tidytable/
Other
450 stars 32 forks source link

left_join: error when y matching columns is a non matching column of x #625

Closed jfdesomzee closed 2 years ago

jfdesomzee commented 2 years ago

Hello,

I have issue with the left_join function. In some cases he's getting confused. This happens when matching columns in y has the same name of a non matching column in x. Dplyr works fine in that case while tidytable return an error

library(tidytable, warn.conflicts = FALSE)
#> As of tidytable v0.9.0 dotless versions of functions are exported.
#> You can now use `arrange()`/`mutate()`/etc. directly.

x <- data.frame(Product=c("A","B","C"),
                Description=c("aaa","bbb","ccc"),
                Val=c(1,2,3))
y <- data.frame(Product=c("A","B","C"),
                Description=c("aaa","bbb","ccc"),
                Val_y=c(10,20,30))

left_join(x,y,by="Description")
#> # A tidytable: 3 × 5
#>   Product.x Description   Val Product.y Val_y
#>   <chr>     <chr>       <dbl> <chr>     <dbl>
#> 1 A         aaa             1 A            10
#> 2 B         bbb             2 B            20
#> 3 C         ccc             3 C            30

left_join(x,y,by=c("Product","Description"))
#> # A tidytable: 3 × 4
#>   Product Description   Val Val_y
#>   <chr>   <chr>       <dbl> <dbl>
#> 1 A       aaa             1    10
#> 2 B       bbb             2    20
#> 3 C       ccc             3    30

left_join(x %>% rename(Desc=Description),y,by=c("Desc"="Description"))
#> # A tidytable: 3 × 5
#>   Product.x Desc    Val Product.y Val_y
#>   <chr>     <chr> <dbl> <chr>     <dbl>
#> 1 A         aaa       1 A            10
#> 2 B         bbb       2 B            20
#> 3 C         ccc       3 C            30

left_join(x %>% rename(Desc=Description),y,by=c("Product"="Product","Desc"="Description"))
#> # A tidytable: 3 × 4
#>   Product Desc    Val Val_y
#>   <chr>   <chr> <dbl> <dbl>
#> 1 A       aaa       1    10
#> 2 B       bbb       2    20
#> 3 C       ccc       3    30

left_join(x %>% mutate(Desc=Description),y,by=c("Product"="Product","Desc"="Description"))
#> Error in colnamesInt(x, neworder, check_dups = FALSE): argument specifying columns specify non existing column(s): cols[2]='Description'

dplyr::left_join(x %>% mutate(Desc=Description),y,by=c("Product"="Product","Desc"="Description"))
#> # A tidytable: 3 × 5
#>   Product Description   Val Desc  Val_y
#>   <chr>   <chr>       <dbl> <chr> <dbl>
#> 1 A       aaa             1 aaa      10
#> 2 B       bbb             2 bbb      20
#> 3 C       ccc             3 ccc      30
markfairbanks commented 2 years ago

Thanks for catching this, I'll take a look.

Also FYI I edited your comment to fix the reprex. Not sure what the issue was when you pasted but since the backticks weren't in there it didn't paste as code.

markfairbanks commented 2 years ago

All set. Thanks again for reporting these 😄

library(tidytable, warn.conflicts = FALSE)
#> As of tidytable v0.9.0 dotless versions of functions are exported.
#> You can now use `arrange()`/`mutate()`/etc. directly.

x <- data.frame(Product=c("A","B","C"),
                Description=c("aaa","bbb","ccc"),
                Val=c(1,2,3))
y <- data.frame(Product=c("A","B","C"),
                Description=c("aaa","bbb","ccc"),
                Val_y=c(10,20,30))

x %>%
  mutate(Desc = Description) %>%
  left_join(y, by = c("Product"="Product", "Desc"="Description"))
#> # A tidytable: 3 × 5
#>   Product Description   Val Desc  Val_y
#>   <chr>   <chr>       <dbl> <chr> <dbl>
#> 1 A       aaa             1 aaa      10
#> 2 B       bbb             2 bbb      20
#> 3 C       ccc             3 ccc      30