Question/request: Is it possible to have an option within dplyr for creating an sql string without case sensitivity enforcement (quotation) on identifiers when extracting data from database tables?
Lets look into this using postgres, creating a table and adding some records via psql. In the terminal window:
> psql
hafstormur/home/einarhj [1002] psql
psql (9.3.6)
Type "help" for help.
einarhj-> \c test
You are now connected to database "test" as user "einarhj".
test=> CREATE TABLE Test_Table (Id INTEGER, Name varchar);
CREATE TABLE
test=> \d
List of relations
Schema | Name | Type | Owner
--------+------------+-------+---------
public | test_table | table | einarhj
(1 row)
test=> \d test_table
Table "public.test_table"
Column | Type | Modifiers
--------+-------------------+-----------
id | integer |
name | character varying |
test=> INSERT INTO test_table VALUES(10,'Einar');
INSERT 0 1
test=> INSERT INTO test_table VALUES(20,'Bjorn');
INSERT 0 1
test=> SELECT Name, Id FROM Test_Table;
name | id
-------+----
Einar | 10
Bjorn | 20
(2 rows)
Ergo: If no quotation is specfied all identifiers are converted to lower case.
Within the dplyr using quotation around all identifiers is the standard. I.e. the only syntax that works to access the table created above is the following:
library(dplyr)
db <- src_postgres("test")
tbl(db, "test_table")
## Source: postgres 9.3.6 [einarhj@localhost:5432/test]
## From: test_table [2 x 2]
##
## id name
## 1 10 Einar
## 2 20 Bjorn
Related to this is this oddity (same sql command as run above from withing psql):
q2 <- tbl(db, sql("select Id, Name from Test_Table"))
explain(q2)
## <SQL>
## SELECT "id", "name"
## FROM (select Id, Name from Test_Table) AS "_W1"
##
##
## <PLAN>
## Seq Scan on test_table (cost=0.00..22.30 rows=1230 width=36)
q2
## Source: postgres 9.3.6 [einarhj@localhost:5432/test]
## From: <derived table> [?? x 2]
##
## id name
## 1 10 Einar
## 2 20 Bjorn
## .. .. ...
In Oracle everthing is reversed, the default case on identifiers is uppercase. Hence for an equivalently created table in Oracle as was done in the psql (ignore that the field names are not exactly the same) one would need to do the following within the dplyr framework:
## <SQL> EXPLAIN PLAN FOR SELECT "SURNAME" AS "SURNAME", "ID" AS "ID"
## FROM "TEST_TABLE"
Again, we still have the following oddity:
q2 <- tbl(db, sql("select Id, Surname from Test_Table"))
explain(q2)
## <SQL>
## SELECT "ID", "SURNAME"
## FROM (select Id, Surname from Test_Table) "_W2"
## <SQL> EXPLAIN PLAN FOR SELECT "ID", "SURNAME"
## FROM (select Id, Surname from Test_Table) "_W2"
## <PLAN>
## TRUE
q2
## Source: Oracle 11.2.0.3.0 [@mar:/mar]
## From: <derived table> [?? x 2]
##
## ID SURNAME
## 1 10 Einar
## 2 20 Bjorn
## .. .. ...
devtools::session_info()
## Session info --------------------------------------------------------------
## setting value
## version R version 3.1.2 (2014-10-31)
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz Atlantic/Reykjavik
## date 2015-08-14
Question/request: Is it possible to have an option within dplyr for creating an sql string without case sensitivity enforcement (quotation) on identifiers when extracting data from database tables?
Background below:
According to this overview of case sensitivity of identifiers among different database platforms things are a bit of a mess in the sql world. E.g. in Postgres the default is to put identifiers (table and field names) to lower case (if quotation is not specified) while in Oracle it is put to uppercase.
Lets look into this using postgres, creating a table and adding some records via
psql
. In the terminal window:Ergo: If no quotation is specfied all identifiers are converted to lower case.
Within the dplyr using quotation around all identifiers is the standard. I.e. the only syntax that works to access the table created above is the following:
Related to this is this oddity (same sql command as run above from withing
psql
):In Oracle everthing is reversed, the default case on identifiers is uppercase. Hence for an equivalently created table in Oracle as was done in the
psql
(ignore that the field names are not exactly the same) one would need to do the following within the dplyr framework:Again, we still have the following oddity: