ropensci / arkdb

Archive and unarchive databases as flat text files
https://docs.ropensci.org/arkdb
Other
78 stars 6 forks source link

Handling for postgres schemas? #12

Open khondula opened 6 years ago

khondula commented 6 years ago

Would it be possible to support Postgres databases that have tables in schemas? I'm running into problems I think because table names have to be prefixed by the schema name in the sql.

# try archiving just tables in specified schema
odm2_tables <- dbGetQuery(db, 
  "SELECT * FROM information_schema.tables 
  WHERE table_schema = 'odm2'")

ark(db, dir, lines = 50000, tables = odm2_tables$table_name)

Error:

Exporting actionannotations in 50000 line chunks:
Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  relation "actionannotations" does not exist
LINE 1: SELECT * FROM actionannotations LIMIT 0
                      ^
)
cboettig commented 6 years ago

Good question! I actually don't have much experience with postgres using schemas. Here's the query that we use to select the "table" to be archived:

https://github.com/cboettig/arkdb/blob/295f0bc354d932aa8d75a1164e58a910744c6cbe/R/ark.R#L130-L132

Looks like we'll want to support some mechanism for a user to specify the schema, rather than just the table name, to indicate how the table should be archived? What should the corresponding dbGetQuery query look like for this case?

khondula commented 6 years ago

That would be great! The table name has to be be referred to as a "qualified" name as schema.tablename. So perhaps something like this?

query <- paste0("SELECT table_schema 
                 FROM information_schema.tables 
                 WHERE table_name = '", table, "'")
schema <- DBI::dbGetQuery(db, query)
query <- paste0("SELECT * FROM ", schema, ".", table, " LIMIT 0")
DBI::dbGetQuery(db, query)

I'm not sure what happens when a table is not in a schema... the last line of the docs does not fill me with hope!

cboettig commented 6 years ago

hmm... can you just manually construct the appropriate <schemaname>.<tablename> in the existing tables argument?

khondula commented 6 years ago

Oh, good call. this worked:

schema_tables <- dbGetQuery(db, sqlInterpolate(db,
"SELECT table_name FROM information_schema.tables 
WHERE table_schema = ?schema", schema = "schema_name"))

ark(db, dir, tables = paste0("schema_name",".", schema_tables$table_name))
cboettig commented 6 years ago

Nice, thanks. I'll add that example into the documentation. Does that sound like a sufficient strategy here?

ilarischeinin commented 4 years ago

This doesn't work for me. Maybe an update to DBI, RPostgres/RPostgreSQL, or arkdb has broken it at some point.

Below is a reprex (with a PostgreSQL server run inside docker). Before that, I do want to say that this schema specification is, or has been, a bit of a mess. Depending on the case (SQL/DBI/RPostgres/RPostgreSQL/dbplyr), it is one of these options that you need:

Reprex

Run the server:

docker pull postgres:12.0
docker run \
  --detach \
  --env POSTGRES_PASSWORD=reprex \
  --name reprex \
  --publish 5432:5432 \
  --rm \
  postgres:12.0

The R reprex:

# open connection
con <-
  DBI::dbConnect(
    RPostgres::Postgres(),
    host = "127.0.0.1",
    dbname = "postgres",
    user = "postgres",
    password = "reprex"
  )

# generate simple dummy data
x <- data.frame(i = 1:10)

# insert that into the database
DBI::dbExecute(con, "CREATE SCHEMA foobar;")
DBI::dbWriteTable(con, DBI::Id(schema = "foobar", table = "x"), x)

# check that it is there and we can read it
DBI::dbReadTable(con, DBI::Id(schema = "foobar", table = "x"))
DBI::dbGetQuery(con, "SELECT * FROM foobar.x")

# but none of these work
arkdb::ark(con, dir = tempdir(), tables = "x")
arkdb::ark(con, dir = tempdir(), tables = "foobar.x")
arkdb::ark(con, dir = tempdir(), tables = c("foobar", "x"))
arkdb::ark(con, dir = tempdir(), tables = DBI::Id(schema = "foobar", table = "x"))
arkdb::ark(con, dir = tempdir(), tables = dbplyr::in_schema(schema = "foobar", table = "x"))

As a side note, even though they give errors, the commands above still generate output files. I think they shouldn't.

Clean up

docker stop reprex
docker rmi postgres:12.0

RPostgres vs. RPostgreSQL

If you're using RPostgreSQL instead of RPostgres as above, instead of

DBI::dbReadTable(con, DBI::Id(schema = "foobar", table = "x"))

you'd need to do

DBI::dbReadTable(con, c(schema = "foobar", table = "x"))

Session info

─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.2 (2019-12-12)
 os       macOS Catalina 10.15.2      
 system   x86_64, darwin15.6.0        
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Europe/Helsinki             
 date     2019-12-18                  

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date       lib source        
 arkdb         0.0.5   2018-10-31 [1] CRAN (R 3.6.0)
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
 backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.0)
 bit           1.1-14  2018-05-29 [1] CRAN (R 3.6.0)
 bit64         0.9-7   2017-05-08 [1] CRAN (R 3.6.0)
 blob          1.2.0   2019-07-09 [1] CRAN (R 3.6.0)
 callr         3.4.0   2019-12-09 [1] CRAN (R 3.6.0)
 cli           2.0.0   2019-12-09 [1] CRAN (R 3.6.0)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)
 DBI           1.1.0   2019-12-15 [1] CRAN (R 3.6.2)
 dbplyr        1.4.2   2019-06-17 [1] CRAN (R 3.6.0)
 desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.0)
 devtools      2.2.1   2019-09-24 [1] CRAN (R 3.6.0)
 digest        0.6.23  2019-11-23 [1] CRAN (R 3.6.0)
 dplyr         0.8.3   2019-07-04 [1] CRAN (R 3.6.0)
 ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.0)
 fansi         0.4.0   2018-10-05 [1] CRAN (R 3.6.0)
 fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.0)
 glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.0)
 hms           0.5.2   2019-10-30 [1] CRAN (R 3.6.1)
 magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)
 memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.0)
 pillar        1.4.2   2019-06-29 [1] CRAN (R 3.6.0)
 pkgbuild      1.0.6   2019-10-09 [1] CRAN (R 3.6.0)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.0)
 pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.0)
 prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.6.0)
 processx      3.4.1   2019-07-18 [1] CRAN (R 3.6.0)
 progress      1.2.2   2019-05-16 [1] CRAN (R 3.6.0)
 ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.0)
 purrr         0.3.3   2019-10-18 [1] CRAN (R 3.6.0)
 R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.0)
 Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.0)
 remotes       2.1.0   2019-06-24 [1] CRAN (R 3.6.0)
 rlang         0.4.2   2019-11-23 [1] CRAN (R 3.6.0)
 RPostgres     1.1.3   2019-12-07 [1] CRAN (R 3.6.0)
 RPostgreSQL   0.6-2   2017-06-24 [1] CRAN (R 3.6.0)
 rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.0)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
 testthat      2.3.1   2019-12-01 [1] CRAN (R 3.6.0)
 tibble        2.1.3   2019-06-06 [1] CRAN (R 3.6.0)
 tidyselect    0.2.5   2018-10-11 [1] CRAN (R 3.6.0)
 usethis       1.5.1   2019-07-04 [1] CRAN (R 3.6.0)
 vctrs         0.2.0   2019-07-05 [1] CRAN (R 3.6.0)
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.0)
 zeallot       0.1.0   2018-01-28 [1] CRAN (R 3.6.0)

[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
cboettig commented 4 years ago

@ilarischeinin Thanks for the reprex. Yeah, as you can see, arkdb doesn't have any explicit logic for postgres schema at the moment, it is largely just counting on the DBI abstractions to be portable. I'd welcome a PR for this, though I imagine many postgres users may already be happy with using COPY to do a bulk export to .csv, which is probably faster than arkdb's chunk iteration. (Having arkdb attempt native methods is also on the to-do list, #24)

1beb commented 2 years ago

A note here, dbplyr has a built in method for accessing a schema:

con <- DBI::dbConnect(...)
my_table <- tbl(con, in_schema("schema", "table"))