r-dbi / RPostgres

A DBI-compliant interface to PostgreSQL
https://rpostgres.r-dbi.org
Other
334 stars 78 forks source link

test: Test for columns in `dbQuoteIdentifier()` #372

Closed krlmlr closed 6 months ago

krlmlr commented 2 years ago

Closes #263.

krlmlr commented 2 years ago

@dpprdan: Checks are failing now, could you please take a look?

dpprdan commented 2 years ago

👍 I'll look into it over the holidays.

dpprdan commented 2 years ago

The checks fail here

With the current CRAN release and an empty database, this looks like this:

library(RPostgres)
con <- postgresDefault()
objects <- dbListObjects(con)

(sql <- lapply(objects$table, dbQuoteIdentifier, conn = con))
#> [[1]]
#> <SQL> "information_schema".
#> 
#> [[2]]
#> <SQL> "pg_catalog".
(unquoted <- vapply(sql, dbUnquoteIdentifier, conn = con, list(1)))
#> [[1]]
#> <Id> schema = information_schema
#> 
#> [[2]]
#> <Id> schema = pg_catalog
testthat::expect_equal(unquoted, unclass(objects$table))

Note however the period (“.”) behind the quoted schemata.

sql[[1]] 
#> <SQL> "information_schema".

This isn’t a valid schema identifier in Postgres.

(qid <- dbQuoteIdentifier(con, Id(schema = "myschema")))
#> <SQL> "myschema".
create_schema_sql <- paste0("CREATE SCHEMA IF NOT EXISTS ", qid)
dbExecute(con, create_schema_sql)
#> Error: Failed to prepare query: ERROR:  syntax error at or near "."
#> LINE 1: CREATE SCHEMA IF NOT EXISTS "myschema".
#>                                               ^

So it seems to me that the test is broken. I don’t think that it can be fixed for schemata either, because the quote-unquote roundtrip cannot be made to work for just schemata. The heuristics employed in dbUnquoteIdentifier() to identify the database objects from a SQL() string can only work for one object class and we chose that to be tables.
The test can probably be fixed for tables if we write a table to the db and then test whether the quote-unquote roundtrip works for that (i.e. only for the objects[!objects$is_prefix] subset). But maybe this test exists already elsewhere?

dbDisconnect(con)
Session info ``` r sessioninfo::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.1.2 (2021-11-01) #> os Windows 10 x64 (build 19043) #> system x86_64, mingw32 #> ui RTerm #> language en #> collate German_Germany.1252 #> ctype German_Germany.1252 #> tz Europe/Berlin #> date 2021-12-28 #> pandoc 2.16.2 @ C:/PROGRA~1/Pandoc/ (via rmarkdown) #> #> - Packages ------------------------------------------------------------------- #> package * version date (UTC) lib source #> backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.2) #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0) #> blob 1.2.2 2021-07-23 [1] CRAN (R 4.1.0) #> cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.1) #> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.1) #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.2) #> desc 1.4.0 2021-09-28 [1] CRAN (R 4.1.1) #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2) #> generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.1) #> glue 1.6.0 2021-12-17 [1] CRAN (R 4.1.2) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.1) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1) #> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.2) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1) #> lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.1.1) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) #> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) #> pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.2) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1) #> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0) #> rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.1) #> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1) #> RPostgres * 1.4.3 2021-12-20 [1] CRAN (R 4.1.2) #> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2) #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.1) #> testthat 3.1.1 2021-12-03 [1] CRAN (R 4.1.2) #> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.2) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) #> withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.2) #> xfun 0.29 2021-12-14 [1] CRAN (R 4.1.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) #> #> [1] C:/Users/Daniel.AK-HAMBURG/Documents/R/win-library/4.1 #> [2] C:/Program Files/R/R-4.1.2/library #> #> ------------------------------------------------------------------------------ ```
krlmlr commented 2 years ago

Thanks. I need to take a look at the workflows.

krlmlr commented 2 years ago

Can you please bring in the latest changes from main to fix the workflows?

dpprdan commented 2 years ago

The test can probably be fixed for tables if we write a table to the db and then test whether the quote-unquote roundtrip works for that (i.e. only for the objects[!objects$is_prefix] subset). But maybe this test exists already elsewhere?

😬 Ha, I guess I should've just read the rest of the file. 🤦🏻‍♂️

BTW From reading the code, I'd say that RMariaDB has the same problem(s) (both the "cannot quote columns" and the "terminal dot" issue).

Current status: trying to get RMariaDB::mariadbDefault() to run with a MariaDB Docker container on Windows.

dpprdan commented 2 years ago

Sure enough I found a case where we need to unquote a schema only, namely in dbListObjects() with a prefix. So this PR unfortunately breaks dbListObjects() with a SQL()-type prefix at the moment.

library(RPostgres)
con <- postgresDefault()
schema_qid <- dbQuoteIdentifier(con, Id(schema = 'Robert'))

The following doesn’t work with v1.4.3, due to the terminal dot

dbExecute(con, paste0('CREATE SCHEMA IF NOT EXISTS ', schema_qid))
#> [1] 0

We need to add a table for dbQuoteIdentifier() to find the schema, see #388

table_qid <- dbQuoteIdentifier(con, Id(schema = 'Robert', table = "mtcars"))
dbWriteTable(con, table_qid, mtcars)

The following works in v1.4.3 (but relies on the terminal dot to unquote the prefix as schema, not table)

dbListObjects(con, schema_qid)
#> Error in mapply(FUN = f, ..., SIMPLIFY = FALSE): zero-length inputs cannot be mixed with those of non-zero length

It works with an Id() object (which just passes through dbUnquoteIdentifier()).

dbListObjects(con, Id(schema = 'Robert'))
#>                                  table is_prefix
#> 1 <Id> schema = Robert, table = mtcars     FALSE

# clean-up
dbExecute(con, paste0('DROP TABLE ', table_qid))
#> [1] 0
dbExecute(con, paste0('DROP SCHEMA "Robert"'))
#> [1] 0
dbDisconnect(con)

I don’t know how to fix this ATM.

krlmlr commented 2 years ago

Thanks. It looks to me like dbListObjects() should always be called with an Id(), never with a SQL() . The behavior is misspecified. Should we adapt the list_objects_features test?

dpprdan commented 2 years ago

It looks to me like dbListObjects() should always be called with an Id(), never with a SQL().

That would be an easy fix, indeed, and way cleaner than all the hacks I came up with until now. I hope we're not missing something.

Can anything else than a schema be passed to prefix?

https://github.com/r-dbi/RPostgres/blob/5eced0cce8e232222e1f6274fade002e939a174d/R/dbListObjects_PqConnection_ANY.R#L22-L26

Because this looks like one could also specify (schema-qualified) table(s) (and then only the schema is used), but I don't understand the use case that prompted this design. Update: I misread the code. This is to only select schemas, not (possibly schema-qualified) tables. So I guess the answer is that schemata and tables can be passed to prefix, but only schemas are used.

It also looks like it is supposed to be vectorized (v*apply()), but DBI::dbQuoteIdentifier() isn't AFAIU (x must be SQL or Id), so dbListObjects() isn't, at least not with a prefix.

dpprdan commented 1 year ago

It looks like dbQuoteIdentifier() with incomplete Id() (e.g., only "catalog" and "schema") used to leave a trailing dot in the identifier string, but no longer does.

It still does in RPostgres:

https://github.com/r-dbi/RPostgres/blob/fa3466fafa777a4b6da990595d396a056ba6c45d/R/dbQuoteIdentifier_PqConnection_Id.R#L12

And in RMariaDB.

It's not present in DBI (and never was IIUC, at least not since https://github.com/r-dbi/DBI/commit/6c111b7f49632b03eff5f934ec9691f57f5f9702)

library(DBI)
dbQuoteIdentifier(ANSI(), Id(schema = "myschema"))
#> <SQL> "myschema"

This PR includes a potential fix for RPostgres

https://github.com/r-dbi/RPostgres/pull/372/files#diff-b87396211065ce77f8649cc9f6daadb09f90ee9f7d3cfe4cd4b8dfdc377f616fR8-R9

It is all a bit confusing because the "(can't) quote columns" and "trailing dot" issues and PRs are conflated here (as well as over at RMariaDB). Maybe it's best to separate those from each other?

As the tests don't fail, I assume this isn't tested anywhere 😱 .

My guess would be that the implicit assumption has always been that IDs contain/refer to a table name.

We could go for a breaking change

Not sure I follow. Breaking change with regard to the trailing dot? AFAICT we're not breaking something that used to work before in RPostgres or RMariaDB. So this would merely be a bugfix?

but a DBItest specification and test would be great.

I'll have a look.

dpprdan commented 1 year ago

In any case, adding support for columns as quoted identifiers is not a breaking change, but rather in accordance with the DBItest spec 👀

Quoted identifiers can be used as table and column names in SQL queries (source)

but a DBItest specification and test would be great.

There aren't any quote-identifier specs and tests at the moment and I am having a hard time coming up with meaningful ones for this problem.

So I'd say that the tests belong in the backend packages rather than in DBItest. Am I missing something?

The MariaDB docs mention many more identifiers (and a similar list applies to Postgres as well - MariaDB is missing schema for example):

Databases, tables, indexes, columns, aliases, views, stored routines, triggers, events, variables, partitions, tablespaces, savepoints, labels, users, roles, are collectively known as identifiers

Shouldn't we support/allow those as well?

krlmlr commented 1 year ago

Can you please merge the latest changes here? Looks like I'm not permitted to do this in this PR.

aviator-app[bot] commented 11 months ago

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes. Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.
dpprdan commented 9 months ago

Fixes #453 too. You may want to adjust the title and description, @krlmlr (since you're the PR owner).

krlmlr commented 6 months ago

Thanks. This now has been implemented independently, but the tests are good.

aviator-app[bot] commented 6 months ago

This pull request failed to merge: PR cannot be automatically rebased, please rebase manually to continue. Once the issues are resolved, remove the blocked label and re-queue the pull request. Note that the pull request will be automatically re-queued if it has the mergequeue label.

Additional debug info: Failed to rebase this PR onto the latest changes from the base branch. You will probably need to rebase this PR manually and resolve conflicts).