tidyverse / dbplyr

Database (DBI) backend for dplyr
https://dbplyr.tidyverse.org
Other
474 stars 173 forks source link

na_matches="na" coerces inequality and overlap joins to equality joins #1505

Open jeprose opened 4 months ago

jeprose commented 4 months ago

There seems to be an issue with dbplyr converting inequality and overlap joins to equality joins when na_matches = "na" is enabled. Given how base dplyr treats na's with these kinds of joins, it might just make sense to throw an error on the combination of these two options or add documentation explaining that na_matches only works with equality joins.

valid_bounds = memdb_frame(lower = c(5,100,300,800,NA), upper = c(30, 200, 500,2000, NA))
people = memdb_frame(id = c(1,2,3), value = c(70, 150,NA))

inner_join(people, valid_bounds, by = join_by(between(value, lower, upper))) %>%
  show_query() %>%
  collect()
#> <SQL>
#> SELECT `dbplyr_002`.*, `dbplyr_001`.*
#> FROM `dbplyr_002`
#> INNER JOIN `dbplyr_001`
#>   ON (
#>     `dbplyr_002`.`value` >= `dbplyr_001`.`lower` AND
#>     `dbplyr_002`.`value` <= `dbplyr_001`.`upper`
#>   )
#> # A tibble: 1 × 4
#>      id value lower upper
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     2   150   100   200
inner_join(people, valid_bounds, by = join_by(between(value, lower, upper)),na_matches = "na")  %>%
  show_query() %>%
  collect()
#> <SQL>
#> SELECT `dbplyr_002`.*, `dbplyr_001`.*
#> FROM `dbplyr_002`
#> INNER JOIN `dbplyr_001`
#>   ON (
#>     `dbplyr_002`.`value` IS `dbplyr_001`.`lower` AND
#>     `dbplyr_002`.`value` IS `dbplyr_001`.`upper`
#>   )
#> # A tibble: 1 × 4
#>      id value lower upper
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     3    NA    NA    NA
inner_join(people, valid_bounds, by = join_by(value >= lower),na_matches = "na")  %>%
  show_query() %>%
  collect()
#> <SQL>
#> SELECT `dbplyr_002`.*, `dbplyr_001`.*
#> FROM `dbplyr_002`
#> INNER JOIN `dbplyr_001`
#>   ON (`dbplyr_002`.`value` IS `dbplyr_001`.`lower`)
#> # A tibble: 1 × 4
#>      id value lower upper
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     3    NA    NA    NA

Created on 2024-05-16 with reprex v2.0.2