r-dbi / odbc

Connect to ODBC databases (using the DBI interface)
https://odbc.r-dbi.org/
Other
392 stars 107 forks source link

Cannot overwrite or append in schemas #140

Closed etiennebr closed 6 years ago

etiennebr commented 6 years ago

Description

It seems that specifying a schema using DBI::SQL() prevents the use of overwrite and append. Here's an example using the public schema, but obviously it is when using other schemas that it is hard to work around.

Database

PostgreSQL 9.3.20 64-bit

Example

library(DBI)                                                                       
con <- dbConnect(odbc::odbc(), driver = "PostgreSQL Unicode", database = "odbc")

dbWriteTable(con, "a", data.frame(a=1, b=2))                                       
dbWriteTable(con, "a", data.frame(a=1, b=2))                                       
#> Error: Table a exists in database, and both overwrite and append are FALSE
dbWriteTable(con, "a", data.frame(a=1, b=2), overwrite = TRUE)                     
dbWriteTable(con, "a", data.frame(a=1, b=2), append = TRUE)                        

dbWriteTable(con, DBI::SQL("a"), data.frame(a=1, b=2))                             
#> Error: Table a exists in database, and both overwrite and append are FALSE
dbWriteTable(con, DBI::SQL("a"), data.frame(a=1, b=2), overwrite = TRUE)           
#> Note: method with signature 'DBIConnection#SQL' chosen for function 'dbQuoteIdentifier',
#>  target signature 'PostgreSQL#SQL'.
#>  "OdbcConnection#character" would also be valid
dbWriteTable(con, DBI::SQL("a"), data.frame(a=1, b=2), append = TRUE)              

dbWriteTable(con, DBI::SQL("public.a"), data.frame(a=1, b=2))                      
#> Error: <SQL> 'CREATE TABLE public.a (
#>   "a" DOUBLE PRECISION,
#>   "b" DOUBLE PRECISION
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: ERROR: relation "a" already exists;
#> Error while executing the query
dbWriteTable(con, DBI::SQL("public.a"), data.frame(a=1, b=2), overwrite = TRUE)    
#> Error: <SQL> 'CREATE TABLE public.a (
#>   "a" DOUBLE PRECISION,
#>   "b" DOUBLE PRECISION
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: ERROR: relation "a" already exists;
#> Error while executing the query
dbWriteTable(con, DBI::SQL("public.a"), data.frame(a=1, b=2), append = TRUE)       
#> Error: <SQL> 'CREATE TABLE public.a (
#>   "a" DOUBLE PRECISION,
#>   "b" DOUBLE PRECISION
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: ERROR: relation "a" already exists;
#> Error while executing the query
colearendt commented 6 years ago

Working with schemas is tricky in DBI, in general, but work is thankfully outlined on that front, at least (https://github.com/r-dbi/DBI/issues/24). The easiest way to workaround this upstream limitation is by defining the schema server-side.

Maybe your point is that this workaround shouldn't be necessary, but I think (hope) some of the schema finicky behavior will be taken care of when DBI has taken schemas into account.

library(odbc)
library(DBI)

con <- dbConnect(odbc::odbc(), ...)

dbWriteTable(con, "a", iris)  ## ok

dbWriteTable(con, "a", iris)  ## err
#> Error: Table a exists in database, and both overwrite and append are FALSE

dbWriteTable(con, "a", iris, overwrite = TRUE)  ## ok

dbWriteTable(con, DBI::SQL("public.a"), iris, overwrite = TRUE)  ## err
#> Note: method with signature 'DBIConnection#SQL' chosen for function 'dbQuoteIdentifier',
#>  target signature 'PostgreSQL#SQL'.
#>  "OdbcConnection#character" would also be valid
#> Error: <SQL> 'CREATE TABLE public.a (
#>   "Sepal.Length" DOUBLE PRECISION,
#>   "Sepal.Width" DOUBLE PRECISION,
#>   "Petal.Length" DOUBLE PRECISION,
#>   "Petal.Width" DOUBLE PRECISION,
#>   "Species" TEXT
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: [RStudio][PostgreSQL] (30) Error occurred while trying to execute a query: [SQLState 42P07] ERROR:  relation "a" already exists
#> 

## working in a different schema

dbWriteTable(con, DBI::SQL("datawarehouse.a"), iris)  ## ok

dbWriteTable(con, DBI::SQL("datawarehouse.a"), iris, overwrite = TRUE)  ## err
#> Error: <SQL> 'CREATE TABLE datawarehouse.a (
#>   "Sepal.Length" DOUBLE PRECISION,
#>   "Sepal.Width" DOUBLE PRECISION,
#>   "Petal.Length" DOUBLE PRECISION,
#>   "Petal.Width" DOUBLE PRECISION,
#>   "Species" TEXT
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: [RStudio][PostgreSQL] (30) Error occurred while trying to execute a query: [SQLState 42P07] ERROR:  relation "a" already exists
#> 

dbExecute(con, "SET search_path = datawarehouse")
#> [1] 0

dbWriteTable(con, "a", iris, overwrite = TRUE) ## ok

Wondering if maybe the dispatch of dbWriteTable on a SQL object doesn't include a DROP TABLE if overwrite=TRUE...

The only place that this really gets painful is cross-schema joins... but dbplyr comes to the rescue there!!

library(dbplyr)
library(dplyr)
library(odbc)
library(DBI)

con <- dbConnect(odbc::odbc(), ...)

tbl_a <- tbl(con, in_schema("public", "a"))
tbl_b <- tbl(con, in_schema("datawarehouse", "a"))

tbl_a %>% left_join(tbl_b, by = c("Species")) %>% sql_render()
#> <SQL> SELECT "TBL_LEFT"."Sepal.Length" AS "Sepal.Length.x", "TBL_LEFT"."Sepal.Width" AS "Sepal.Width.x", "TBL_LEFT"."Petal.Length" AS "Petal.Length.x", "TBL_LEFT"."Petal.Width" AS "Petal.Width.x", "TBL_LEFT"."Species" AS "Species", "TBL_RIGHT"."Sepal.Length" AS "Sepal.Length.y", "TBL_RIGHT"."Sepal.Width" AS "Sepal.Width.y", "TBL_RIGHT"."Petal.Length" AS "Petal.Length.y", "TBL_RIGHT"."Petal.Width" AS "Petal.Width.y"
#>   FROM public.a AS "TBL_LEFT"
#>   LEFT JOIN datawarehouse.a AS "TBL_RIGHT"
#>   ON ("TBL_LEFT"."Species" = "TBL_RIGHT"."Species")
colearendt commented 6 years ago

So extending what I said, I think what you have uncovered @etiennebr is an inconsistency in the way that SQL interpretation happens wrt schemas. Again, something that will hopefully be taken into account and tested for as DBI adds schema support.

Also, I reiterate my suggestion to do the following until that is the case.

  1. Use dbplyr for in_schema() references
  2. Alter the search path
  3. Use database valid names... double quoting can get nasty
  4. Use verbatim SQL with dbGetQuery or dbSendQuery

Quick semantics - dbWriteTable uses dbExistsTable as a test internally when overwrite=TRUE to see whether it should drop the existing table. The problem you are running into comes from dbExistsTable and dbWriteTable understanding schemas differently.

Check out this sequence:

library(odbc)
library(DBI)

con <- dbConnect(odbc::odbc(), ...)

# I got this problematic sequence from the original example above
dbWriteTable(con, DBI::SQL("a"), data.frame(a = 1, b = 2))
dbWriteTable(con, DBI::SQL("public.a"), data.frame(a = 1, b = 2))
#> Error: <SQL> 'CREATE TABLE public.a (
#>   "a" DOUBLE PRECISION,
#>   "b" DOUBLE PRECISION
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: [RStudio][PostgreSQL] (30) Error occurred while trying to execute a query: [SQLState 42P07] ERROR:  relation "a" already exists
#> 
dbWriteTable(con, DBI::SQL("public.a"), data.frame(a = 1, b = 2), overwrite = TRUE)
#> Error: <SQL> 'CREATE TABLE public.a (
#>   "a" DOUBLE PRECISION,
#>   "b" DOUBLE PRECISION
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: [RStudio][PostgreSQL] (30) Error occurred while trying to execute a query: [SQLState 42P07] ERROR:  relation "a" already exists
#> 

# this is why that happens
dbExistsTable(con, DBI::SQL("public.a"))
#> [1] FALSE
dbWriteTable(con, "public.a", data.frame(a = 1, b = 2))
dbExistsTable(con, DBI::SQL("public.a"))
#> [1] TRUE

And our schema looks like...

image

Essentially, when you say overwrite=TRUE, dbExistsTable comes along and says "no worries - this table is not defined yet." So "DROP TABLE" never happens. Then the "Create Table" fails because the table already exists. They're referencing different tables!

colearendt commented 6 years ago

Oops. This is a duplicate of #91 . Same issue, I think.

etiennebr commented 6 years ago

Thanks @colearendt, you're absolutely right and your workaround using dbplyr is ingenious! I thought the issue was specific to odbc's options overwrite and append, but indeed it is generalized to DBI and I know there are plans to fix schema management, hopefully soon.