tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.79k stars 2.12k forks source link

translation of setdiff for Oracle should be MINUS #3493

Closed copernican closed 6 years ago

copernican commented 6 years ago

It appears that dbplyr is translating dplyr::setdiff() as EXCEPT. For Oracle Database, the corresponding operator is MINUS. The translation is accordingly not working. As best I can tell, this is a dbplyr issue.

The variables `dbq`, `uid`, and `pwd` store the TNS service name, username, and password, respectively. The table `DUAL` should exist in the `SYS` schema for all versions of Oracle Database. I would expect `setdiff(tbl, tbl)` to return the empty set, but instead it throws an error. The final query modifies the auto-generated SQL from **`dbplyr`** to replace `EXCEPT` by `MINUS`, and returns the expected result. ```r con <- DBI::dbConnect(drv = odbc::odbc(), Driver = "Oracle in instantclient_12_2", Dbq = dbq, uid = uid, pwd = pwd) tbl <- dplyr::tbl(con, "DUAL") dplyr::setdiff(tbl, tbl) #> Error: 'SELECT * FROM (SELECT * #> FROM ((SELECT * #> FROM "DUAL") #> EXCEPT #> (SELECT * #> FROM "DUAL")) "ebzdgbnsmd") "zzz2" WHERE ROWNUM <= 10' #> nanodbc/nanodbc.cpp:1587: HY000: [Oracle][ODBC][Ora]ORA-00907: missing right parenthesis #> qry_except <- "SELECT * FROM ( SELECT * FROM ( (SELECT * FROM \"DUAL\") EXCEPT (SELECT * FROM \"DUAL\") ) \"xqcikplkqs\" ) \"zzz13\" WHERE ROWNUM <= 10" DBI::dbGetQuery(con, qry_except) #> Error: 'SELECT * #> FROM ( #> SELECT * #> FROM ( #> (SELECT * FROM "DUAL") #> EXCEPT #> (SELECT * FROM "DUAL") #> ) "xqcikplkqs" #> ) "zzz13" #> WHERE ROWNUM <= 10' #> nanodbc/nanodbc.cpp:1587: HY000: [Oracle][ODBC][Ora]ORA-00907: missing right parenthesis #> qry_minus <- "SELECT * FROM ( SELECT * FROM ( (SELECT * FROM \"DUAL\") MINUS (SELECT * FROM \"DUAL\") ) \"xqcikplkqs\" ) \"zzz13\" WHERE ROWNUM <= 10" DBI::dbGetQuery(con, qry_minus) #> [1] DUMMY #> <0 rows> (or 0-length row.names) DBI::dbDisconnect(con) ``` I am using Oracle Instant Client version 12.2.0.1.0 and the matching ODBC driver. Here are the relevant lines from `session_info()`: ``` DBI * 0.8 2018-03-21 Github (jimhester/DBI@17f7e8f) dbplyr * 1.2.1 2018-02-19 CRAN (R 3.4.3) dplyr * 0.7.4.9000 2018-01-31 Github (tidyverse/dplyr@7541b1b) odbc * 1.1.5 2018-03-21 Github (r-dbi/odbc@dc1bbf6) ```
hadley commented 6 years ago

@edgararuiz do you want to take this one?

Minimal reprex:

library(dplyr, warn.conflicts = FALSE)

lf1 <- dbplyr::lazy_frame(x = 1:4, src = dbplyr::simulate_oracle())
lf2 <- dbplyr::lazy_frame(x = 1:4, src = dbplyr::simulate_oracle())

lf1 %>%                                    
  setdiff(lf2) %>%
  show_query()
#> <SQL> (SELECT *
#> FROM `df`)
#> EXCEPT
#> (SELECT *
#> FROM `df`)

Created on 2018-05-20 by the reprex package (v0.2.0).

edgararuiz-zz commented 6 years ago

Yes, I'll be glad to take it.

lock[bot] commented 6 years ago

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/