tidyverse / dbplyr

Database (DBI) backend for dplyr
https://dbplyr.tidyverse.org
Other
474 stars 174 forks source link

`expand()` only works if expression contains column of database #720

Open mgirlich opened 2 years ago

mgirlich commented 2 years ago
library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)

# the generated SQL is obviously wrong
memdb_frame(x = 1:2, y = 1:2) %>% 
  tidyr::expand(x = 1:3, y) %>% 
  show_query()
#> <SQL>
#> SELECT `x`, `y`
#> FROM (SELECT DISTINCT (1, 2, 3) AS `x`
#> FROM `dbplyr_001`) AS `LHS`
#> LEFT JOIN (SELECT DISTINCT `y`
#> FROM `dbplyr_001`) AS `RHS`

tibble(x = 1:2, y = 1:2) %>% 
  tidyr::expand(x = 1:3, y)
#> # A tibble: 6 × 2
#>       x     y
#>   <int> <int>
#> 1     1     1
#> 2     1     2
#> 3     2     1
#> 4     2     2
#> 5     3     1
#> 6     3     2

Created on 2021-11-03 by the reprex package (v2.0.1)

This should provide a more helpful error message, e.g. an example of how to make this work:

memdb_frame(x = 1:2, y = 1:2) %>% 
  tidyr::expand(x = 1:3, y)

is equivalent to

mf1 <- memdb_frame(x = 1:2, y = 1:2)
mf2 <- memdb_frame(x = 1:3)

mf_prep <- mf1 %>% 
  select(-x) %>% 
  cross_join(mf2)

mf_prep %>% 
  tidyr::expand(x, y)

It's not quite that easy as we also support expressions, e.g. tidyr::expand(df, round(x / 2)).

hadley commented 2 years ago

I think it would also be fine to scope this more narrowly than for data frames, and error when we encounter a non-symbol.