Open khondula opened 6 years ago
Good question! I actually don't have much experience with postgres using schemas. Here's the query that we use to select the "table" to be archived:
https://github.com/cboettig/arkdb/blob/295f0bc354d932aa8d75a1164e58a910744c6cbe/R/ark.R#L130-L132
Looks like we'll want to support some mechanism for a user to specify the schema, rather than just the table name, to indicate how the table should be archived? What should the corresponding dbGetQuery
query look like for this case?
That would be great! The table name has to be be referred to as a "qualified" name as schema.tablename
. So perhaps something like this?
query <- paste0("SELECT table_schema
FROM information_schema.tables
WHERE table_name = '", table, "'")
schema <- DBI::dbGetQuery(db, query)
query <- paste0("SELECT * FROM ", schema, ".", table, " LIMIT 0")
DBI::dbGetQuery(db, query)
I'm not sure what happens when a table is not in a schema... the last line of the docs does not fill me with hope!
hmm... can you just manually construct the appropriate <schemaname>.<tablename>
in the existing tables
argument?
Oh, good call. this worked:
schema_tables <- dbGetQuery(db, sqlInterpolate(db,
"SELECT table_name FROM information_schema.tables
WHERE table_schema = ?schema", schema = "schema_name"))
ark(db, dir, tables = paste0("schema_name",".", schema_tables$table_name))
Nice, thanks. I'll add that example into the documentation. Does that sound like a sufficient strategy here?
This doesn't work for me. Maybe an update to DBI
, RPostgres
/RPostgreSQL
, or arkdb
has broken it at some point.
Below is a reprex (with a PostgreSQL server run inside docker). Before that, I do want to say that this schema specification is, or has been, a bit of a mess. Depending on the case (SQL
/DBI
/RPostgres
/RPostgreSQL
/dbplyr
), it is one of these options that you need:
"schema.table"
c("schema", "table")
DBI::Id(schema = "schema", table = "table")
dbplyr::in_schema(schema = "schema", table = "table")
Run the server:
docker pull postgres:12.0
docker run \
--detach \
--env POSTGRES_PASSWORD=reprex \
--name reprex \
--publish 5432:5432 \
--rm \
postgres:12.0
The R reprex:
# open connection
con <-
DBI::dbConnect(
RPostgres::Postgres(),
host = "127.0.0.1",
dbname = "postgres",
user = "postgres",
password = "reprex"
)
# generate simple dummy data
x <- data.frame(i = 1:10)
# insert that into the database
DBI::dbExecute(con, "CREATE SCHEMA foobar;")
DBI::dbWriteTable(con, DBI::Id(schema = "foobar", table = "x"), x)
# check that it is there and we can read it
DBI::dbReadTable(con, DBI::Id(schema = "foobar", table = "x"))
DBI::dbGetQuery(con, "SELECT * FROM foobar.x")
# but none of these work
arkdb::ark(con, dir = tempdir(), tables = "x")
arkdb::ark(con, dir = tempdir(), tables = "foobar.x")
arkdb::ark(con, dir = tempdir(), tables = c("foobar", "x"))
arkdb::ark(con, dir = tempdir(), tables = DBI::Id(schema = "foobar", table = "x"))
arkdb::ark(con, dir = tempdir(), tables = dbplyr::in_schema(schema = "foobar", table = "x"))
As a side note, even though they give errors, the commands above still generate output files. I think they shouldn't.
docker stop reprex
docker rmi postgres:12.0
RPostgres
vs. RPostgreSQL
If you're using RPostgreSQL
instead of RPostgres
as above, instead of
DBI::dbReadTable(con, DBI::Id(schema = "foobar", table = "x"))
you'd need to do
DBI::dbReadTable(con, c(schema = "foobar", table = "x"))
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 3.6.2 (2019-12-12)
os macOS Catalina 10.15.2
system x86_64, darwin15.6.0
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Helsinki
date 2019-12-18
─ Packages ───────────────────────────────────────────────────────────────────
package * version date lib source
arkdb 0.0.5 2018-10-31 [1] CRAN (R 3.6.0)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0)
bit 1.1-14 2018-05-29 [1] CRAN (R 3.6.0)
bit64 0.9-7 2017-05-08 [1] CRAN (R 3.6.0)
blob 1.2.0 2019-07-09 [1] CRAN (R 3.6.0)
callr 3.4.0 2019-12-09 [1] CRAN (R 3.6.0)
cli 2.0.0 2019-12-09 [1] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.2)
dbplyr 1.4.2 2019-06-17 [1] CRAN (R 3.6.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.0)
digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.0)
dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.0)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
hms 0.5.2 2019-10-30 [1] CRAN (R 3.6.1)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0)
pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
progress 1.2.2 2019-05-16 [1] CRAN (R 3.6.0)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
purrr 0.3.3 2019-10-18 [1] CRAN (R 3.6.0)
R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.0)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
rlang 0.4.2 2019-11-23 [1] CRAN (R 3.6.0)
RPostgres 1.1.3 2019-12-07 [1] CRAN (R 3.6.0)
RPostgreSQL 0.6-2 2017-06-24 [1] CRAN (R 3.6.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.0)
tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0)
usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.0)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
@ilarischeinin Thanks for the reprex. Yeah, as you can see, arkdb
doesn't have any explicit logic for postgres schema at the moment, it is largely just counting on the DBI abstractions to be portable. I'd welcome a PR for this, though I imagine many postgres users may already be happy with using COPY to do a bulk export to .csv, which is probably faster than arkdb
's chunk iteration. (Having arkdb
attempt native methods is also on the to-do list, #24)
A note here, dbplyr
has a built in method for accessing a schema:
con <- DBI::dbConnect(...)
my_table <- tbl(con, in_schema("schema", "table"))
Would it be possible to support Postgres databases that have tables in schemas? I'm running into problems I think because table names have to be prefixed by the schema name in the sql.
Error: