r-dbi / odbc

Connect to ODBC databases (using the DBI interface)
https://odbc.r-dbi.org/
Other
391 stars 107 forks source link

include synonyms in `odbcListObjects()` output #773

Open simonpcouch opened 8 months ago

simonpcouch commented 8 months ago

Closes #221.

This PR modifies/adds SQL Server methods for dbListTables(), dbExistsTable(), and odbcListObjects() that include synonyms in output. This also means that synonyms will show up in the Connections pane.

Note that this PR makes no changes to dbListTables(), which is noted in the original issue.

[EDITs: update PR scope]

ThomasSoeiro commented 7 months ago

@simonpcouch Hi, I have the same issue with an Oracle database. Should I open a new issue or can it be taken care of here? Thanks!

simonpcouch commented 7 months ago

@ThomasSoeiro A separate issue is good! :)

hadley commented 7 months ago

Hmmm, I think I'd be inclined to include these in dbListTables(), using our principle that if you can do SELECT foo then dbExistsTable(con, "foo") should be true, and "foo" should be included in the output of dbListTables().

simonpcouch commented 7 months ago

Synonyms are now supported with dbListTables() and dbExistsTable().

With `main` ``` r library(DBI) library(odbc) con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA", pwd = Sys.getenv("sqlServerPass")) dbExecute(con, "create schema odbc") #> [1] 0 dbExecute(con, "create table odbc.test (x int)") #> [1] 0 # confirm that we can find the table: odbcListObjects(con, catalog = "master", schema = "odbc") #> name type #> 1 test table dbListTables(con, catalog = "master", schema = "odbc") #> [1] "test" # make a synonym and show that it can't be found: dbExecute(con, "create synonym odbc.test2 for odbc.test") #> [1] 0 odbcListObjects(con, catalog = "master", schema = "odbc") #> name type #> 1 test table dbListTables(con, catalog = "master", schema = "odbc") #> [1] "test" ```
With this PR With this PR: ``` r library(DBI) library(odbc) con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA", pwd = Sys.getenv("sqlServerPass")) dbExecute(con, "create schema odbc") #> [1] 0 dbExecute(con, "create table odbc.test (x int)") #> [1] 0 # confirm that we can find the table: odbcListObjects(con, catalog = "master", schema = "odbc") #> name type #> 1 test table dbListTables(con, catalog = "master", schema = "odbc") #> [1] "test" # make a synonym and show that it CAN be found: dbExecute(con, "create synonym odbc.test2 for odbc.test") #> [1] 0 odbcListObjects(con, catalog = "master", schema = "odbc") #> name type #> 1 test table #> 2 test2 table dbListTables(con, catalog = "master", schema = "odbc") #> [1] "test" "test2" ```

My biggest concern at this point is performance, esp. in cases when databases have many synonyms. Should this feature be gated behind an argument?

simonpcouch commented 6 months ago

Here are those benchmarks with the PR as-is (EDIT: updated timings below):

Elapsed times to the 3 affected functions with 0 to 10000 synonyms present. One column of plots gives CRAN timings and the other giving dev, showing a drastic increase in timings for the dev package.

..where the values to the left of x = 1 are x = 0 (with a small shift for log(0)). "Default" objects means what's available in a relatively fresh SQL Server Docker instance—length(dbListTables(con)) is 618. Timings for dbExistsTable() are for a table that doesn't exist, so the synonyms_query kicks in. Generally, doesn't look great.🏄

Will look into your most recent comments.

Source ```r library(tidyverse) # run the following, once with cran odbc and once with dev ------------------------------ library(DBI) library(odbc) con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA", pwd = Sys.getenv("sqlServerPass")) dbExecute(con, "create schema odbc") dbExecute(con, "create table odbc.test (x int)") n_synonyms <- c(0, 10^c(0:4)) timings <- data.frame( n = numeric(0), odbcList = numeric(0), dbList = numeric(0), dbExists = numeric(0) ) for (n_synonym in n_synonyms) { for (n in seq_len(n_synonym)) { dbExecute( con, SQL(paste0("create synonym odbc.test", n, " for odbc.test")) ) } for (i in 1:3) { timings <- rbind( timings, data.frame( n = n_synonym, odbcList = system.time(odbcListObjects(con))[["elapsed"]], dbList = system.time(dbListTables(con))[["elapsed"]], dbExists = system.time(dbExistsTable(con, "test0"))[["elapsed"]], version = ifelse(packageVersion("odbc") == "1.4.2", "cran", "dev") ) ) } for (n in seq_len(n_synonym)) { dbExecute( con, SQL(paste0("drop synonym odbc.test", n)) ) } } # write to the respective .csv # write_csv(timings, file = "dev_timings.csv") # write_csv(timings, file = "cran_timings.csv") # then, reading in and plotting -------------------------------------------------------------- dev_timings <- read_csv("dev_timings.csv") cran_timings <- read_csv("cran_timings.csv") bind_rows(dev_timings, cran_timings) %>% pivot_longer(cols = c(odbcList, dbList, dbExists), names_to = "fn") %>% mutate(n = n + .1) %>% ggplot() + geom_point(aes(x = n, y = value)) + facet_grid(rows = vars(fn), cols = vars(version)) + scale_x_log10() + labs( x = "Number of Synonyms", y = "Time", caption = "Only 'default' objects + 1 0-row table, 3 calls per (n, fn) pair" ) ```
simonpcouch commented 6 months ago

Updated benchmarks:

Screenshot 2024-04-11 at 2 00 03 PM

Definitely better than before but not negligible. I haven't seen enough real-world DBs to know how close to "worst-case" this is.

simonpcouch commented 5 months ago

Let's table this PR until at least after the upcoming release. Will mark as draft for now.