r-dbi / RPostgres

A DBI-compliant interface to PostgreSQL
https://rpostgres.r-dbi.org
Other
334 stars 78 forks source link

chore: refactor `dbListTables()` et al. #413

Closed dpprdan closed 6 months ago

dpprdan commented 1 year ago

I would like to take another stab at #251 and this some preparatory work for that.

Background

Materialized views are currently not returned by dbListTables() et al (#251). Materialized Views are not included in INFORMATION_SCHEMA.tables (see this thread for why they are not. Tl;dr: "They are not defined by the SQL standard.")

The information schema consists of a set of views that contain information about the objects defined in the current database. The information schema is defined in the SQL standard and can therefore be expected to be portable and remain stable — unlike the system catalogs, which are specific to PostgreSQL and are modeled after implementation concerns. The information schema views do not, however, contain information about PostgreSQL-specific features; to inquire about those you need to query the system catalogs or other PostgreSQL-specific views. https://www.postgresql.org/docs/current/information-schema.html (emphasis mine).

Sidenote: Most if not all tables in information_schema are just views of the system catalogs, see e.g.

SELECT definition FROM pg_views 
WHERE schemaname = 'information_schema' AND viewname = 'tables';

Since mviews are PostgreSQL-specific features we need to query the system catalogs. In particular, we need to use pg_class/pg_namespace instead of INFORMATION_SCHEMA.tables.

However, in order to retain support for Redshift, we need to keep the INFORMATION_SCHEMA queries as well.

What happens in this PR?

The above affects the queries that underly the dbListTables(), dbExistsTable(), dbListObjects(), and dbListFields() functions.

dbListTables(), dbExistsTable() and dbListObjects() do essentially the same thing, namely call INFORMATION_SCHEMA.tables. In order to make the abovementioned adjustments easier and keep duplicated code at a minimum, I refactored those functions to use a common core function list_tables(), which returns the SQL code to query INFORMATION_SCHEMA.tables. The (alternative) queries to pg_class (to be added in another PR) will then live here.

I refactored the find_table() and list_fields() functions that underly dbListFields() in a similar fashion.

Details `dbListFields()`: find the first table on the search path ```sql SELECT column_name FROM ( SELECT *, rank() OVER (ORDER BY nr) AS rnr FROM ( SELECT nr, schemas[nr] AS table_schema FROM ( SELECT *, generate_subscripts(schemas, 1) AS nr FROM ( SELECT current_schemas(true) AS schemas) t ) tt WHERE schemas[nr] <> 'pg_catalog' ) ttt INNER JOIN INFORMATION_SCHEMA.columns USING (table_schema) WHERE table_name = [table_name] ) tttt WHERE rnr = 1 ORDER BY ordinal_position ``` can be re-written as (with 2 instead of 4 nested queries) ```sql SELECT column_name FROM ( SELECT *, rank() OVER (ORDER BY schema_nr) AS schema_rank FROM ( SELECT * FROM unnest(current_schemas(true)) WITH ORDINALITY AS "tbl"("table_schema","schema_nr") WHERE "table_schema" != 'pg_catalog') t INNER JOIN INFORMATION_SCHEMA.columns USING (table_schema) WHERE table_name = [table_name]) tt WHERE schema_rank = 1 ORDER BY ordinal_position ```

Essentially I applied similar query code here as in #261, but calling information_schema instead of the system catalogs. I will open another PR to incorporate the queries to the system catalogs.

It is probably easiest to review this by going through the individual commits.

I do not have access to Redshift, so I was not able to test this PR there. There might be small adjustments necessary (see below). Is it somehow possible for me to test on Redshift (e.g. via GHA)? Or how does this work in general?

Related issues

https://github.com/r-dbi/RPostgres/issues/251 https://github.com/r-dbi/RPostgres/pull/261

This does not address #388. I think it is best to address this after #251 is resolved.

390 is also related, but relatively trivial to apply before or after this.

krlmlr commented 1 year ago

Conflicts here, too.

aviator-app[bot] commented 11 months ago

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes. Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged using Aviator.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.
krlmlr commented 6 months ago

Thanks!