kylemaxxwell / rpostgresql

Automatically exported from code.google.com/p/rpostgresql
0 stars 0 forks source link

Non Consistent Capitalization #16

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
See emails below.

==========================================================

Paul Gilbert wrote:
What I consider to be not ok is (from you example)

   res <- dbSendQuery(con, "create table Foo1 (f1 int)")
...
[1] "Foo1 Table does not exist.\n"

Right -- as I said in my response to Neil, this case *should* result in
"foo1 Table exists.\n"

Note that the Foo1 was case-folded to foo1

So the user has to know that Postgres is going to translate the name to
foo1 and also has to know that when they ask about the existence of Foo1
Postgres is not going to translate that to foo1 like it did when the
table was created.

I think if create without quotes does not distinguish case, then
checking about existence with quotes should not distinguish case.

And in my original response I show how it ought to be corrected within
RPostgreSQL:

-- user passed "Xeta" *with* the quotes
contrib_regression=# select tablename from pg_tables where
   schemaname !='information_schema' and schemaname !='pg_catalog'
   and tablename='Xeta';
tablename
-----------
Xeta
(1 row)

-- user passed Xeta *without* quotes
contrib_regression=# create table Xeta (f1 int);
CREATE TABLE
contrib_regression=# select tablename from pg_tables where
   schemaname !='information_schema' and schemaname !='pg_catalog'
   and tablename=lower('Xeta');
tablename
-----------
xeta
(1 row)

Joe
==========================================================
What I consider to be not ok is (from you example)

   res <- dbSendQuery(con, "create table Foo1 (f1 int)")
...
[1] "Foo1 Table does not exist.\n"

So the user has to know that Postgres is going to translate the name to
foo1 and also has to know that when they ask about the existence of Foo1
Postgres is not going to translate that to foo1 like it did when the
table was created.

I think if create without quotes does not distinguish case, then
checking about existence with quotes should not distinguish case.

Paul

==========================================================
Neil Tiffin wrote:
I am not sure I understand the problem.

the following R code

   res <- dbSendQuery(con, "create table Foo1 (f1 int)")
   res <- dbSendQuery(con, "create table \"Foo2\" (f1 int)")

   if (dbExistsTable(con, "Foo1")) {
       print("Foo1 Table exists.\n")
   }
   else {
       print("Foo1 Table does not exist.\n")
   }

   if (dbExistsTable(con, "foo1")) {
       print("foo1 Table exists.\n")
   }
   else {
       print("foo1 Table does not exist.\n")
   }

   if (dbExistsTable(con, "Foo2")) {
       print("Foo2 Table exists.\n")
   }
   else {
       print("Foo2 Table does not exist.\n")
   }

   if (dbExistsTable(con, "foo2")) {
       print("foo2 Table exists.\n")
   }
   else {
       print("foo2 Table does not exist.\n")
   }

results in

[1] "Foo1 Table does not exist.\n"
[1] "foo1 Table exists.\n"

both of these ought to result in "foo1 Table exists.\n"

[1] "Foo2 Table exists.\n"
[1] "foo2 Table does not exist.\n"

both of these ought to result in "foo2 Table does not exist.\n"

This seems like it is ok.  What am I missing?

And there should be a third case:

    if (dbExistsTable(con, "\"Foo2\"")) {
        print("\"Foo2\" Table exists.\n")
    }
    else {
        print("\"Foo2\" Table does not exist.\n")
    }

    if (dbExistsTable(con, "\"foo2\"")) {
        print("\"foo2\" Table exists.\n")
    }
    else {
        print("\"foo2\" Table does not exist.\n")
    }

and it should return:

[1] "\"Foo2\" Table exists.\n"
[1] "\"foo2\" Table does not exist.\n"

Joe

==========================================================
Paul Gilbert wrote:

Dirk suggested you guys may have some ideas.  I have a PostgreSQL
problem with capitalized table names, so in R I get bad results when
checking is a table exists. Below are details from a message I sent
Dirk. If you have any suggestions I would appreciate hearing.

but it is double quotes (“) that Postgres treats as an indication that
the name should be kept with the capitalization as specified, not the
single quotes (‘) you used.  The double quotes (“) is very non-standard

Actually the double quotes for identifiers (table and other database
object names) *is* in accordance with the SQL standard. Or rather the
ability to double quote identifiers is standard.

The only non-standard aspect of Postgres behavior is that it
automatically folds non-quoted identifiers to lowercase rather than
uppercase as the SQL standard would require.

So, if you specific a table name like:
 create table Foo (f1 int);
it is exactly the same as:
 create table FOO (f1 int);
and
 create table fOO (f1 int);
and
 create table foo (f1 int);
In all of these cases the actual table created will be named
 foo
i.e. all in lower case.
------------------------

On the other hand, if you specific a table name like:
 create table "Foo" (f1 int);
the actual table created will be named
 Foo
i.e. exactly as you specified.
------------------------

General guidance is that you should either *always* quote your
identifiers, or *never* quote your identifiers.
------------------------

Now there is a complication here as we have RPostgreSQL in between the
user and PostgreSQL. I don't have time today to look closely at how
RPostgreSQL goes about its business, but what it ought to do is:

1) if the identifier provided by the user was quoted, search the
information schema for exactly that identifier but without the quotes

2) if the identifier provided by the user was not quoted, search the
information schema for lower(identifier)
------------------------

For example:

-- user passed "Xeta" *with* the quotes
contrib_regression=# select tablename from pg_tables where
   schemaname !='information_schema' and schemaname !='pg_catalog'
   and tablename='Xeta';
tablename
-----------
Xeta
(1 row)

-- user passed Xeta *without* quotes
contrib_regression=# create table Xeta (f1 int);
CREATE TABLE
contrib_regression=# select tablename from pg_tables where
   schemaname !='information_schema' and schemaname !='pg_catalog'
   and tablename=lower('Xeta');
tablename
-----------
xeta
(1 row)
------------------------

Hope this helps

Joe

==========================================================
Joe, Neil

Dirk suggested you guys may have some ideas.  I have a PostgreSQL problem with 
capitalized table names, so in R I get bad results when checking is a table 
exists. Below are details from a message I sent Dirk. If you have any 
suggestions I would appreciate hearing.

Thanks,
Paul Gilbert
___

Dirk
I’m still struggling a bit with the capitalization problem. The difficulty 
seems to be that Postgres is not consistent.  I thought it was the quotes 
around the table name  in your new dbExistsTable() code, but it is double 
quotes (“) that Postgres treats as an indication that the name should be kept 
with the capitalization as specified, not the single quotes (‘) you used.  
The double quotes (“) is very non-standard and gets really messy because they 
then always have to be used in reference to the table, so I am not suggesting 
that direction.

The problem is that I would like consistent behaviour with regard to asking if 
Xeta exists, ie, if create table Xeta actually creates xeta then don't 
distinguish Xeta from xeta when asking if the table exists. I can see 
workarounds in the R code (as was probably in the previous version if you were 
using the RMySQL model), but I think it should be possible to do this in the 
SQL, however I cannot. Using psql:

CREATE TABLE Xeta (id CHAR(40) , v INT);
# next both work
\d Xeta ;
\d xeta ;

# this gives 1
select tablename from pg_tables where
    schemaname !='information_schema' and schemaname !='pg_catalog'
    and tablename='xeta';

# this gives 0
select tablename from pg_tables where
    schemaname !='information_schema' and schemaname !='pg_catalog'
    and tablename='Xeta';

#This gives 1
SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public' 
AND table_name = 'xeta';

#This gives 0
SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public' 
AND table_name = 'Xeta';

DROP TABLE Xeta;

# I don’t suggest this approach, but here is what happens
CREATE TABLE "Xeta" (id CHAR(40) , v INT);
# next both fail
\d Xeta ;
\d xeta ;
# this works
\d "Xeta" ;
# this fails
\d "xeta" ;
# this gives 0
select tablename from pg_tables where
    schemaname !='information_schema' and schemaname !='pg_catalog'
    and tablename='xeta';

# this gives 1
select tablename from pg_tables where
    schemaname !='information_schema' and schemaname !='pg_catalog'
    and tablename='Xeta';

# This gives 0
SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public' 
AND table_name = 'xeta';

# This gives 1
SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public' 
AND table_name = 'Xeta';

The double quotes are an error in MySQL, but MySQL is consistent with the first 
create, in that any capitalization combination works and the table is indicated 
as existing. The RMySQL code overdoes this by converting everything to upper or 
lower, but this does not seem necessary as a MySQL query can be constructed to 
give a consistent result. (However, there may be information_schema permission 
issues for normal users, but that is a different story.)

I’m not sure what to suggest, other than going back to the R workaround. I 
might be able to work around this in my code by specifying all the tables in 
lower case, but I have not tested it with the other the database engines I use, 
and in any case, that seems to be relying on the others to compensate for a 
Postgres problem. Also, this does not solve the problem for other RPostgreSQL 
users.

Best,
Paul

================================================================================
====

La version française suit le texte anglais.

--------------------------------------------------------------------------------
----

This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 

--------------------------------------------------------------------------------
----

Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.

Original issue reported on code.google.com by ne...@neiltiffin.com on 24 Aug 2010 at 12:04

GoogleCodeExporter commented 9 years ago

Original comment by ne...@neiltiffin.com on 24 Aug 2010 at 12:06

GoogleCodeExporter commented 9 years ago
Test case exists as tests/createTableMixedCaseTest .r

Original comment by ne...@neiltiffin.com on 24 Aug 2010 at 12:09

GoogleCodeExporter commented 9 years ago
The test case creates two tables and does not remove it.
On the second run it fails because of the preexisting table.

Note
    res <- dbSendQuery(con, "drop table Foo1")
    res <- dbSendQuery(con, "drop table \"Foo2\"")
will remove as what are created with create table, irrespective of
the case sensitivity of dbExistsTable and dbRemoveTable.

It seems consistent if argument of dbExistsTable and dbRemoveTable,
are just case sensitive and case sensitive string should always quoted
in translating to postgresql query.  So, I don't find any reason to assign FAIL
on current output for dbExistsTable. (as in the test-mixcase.patch2).

There may be insufficient quoting for DROP TABLE and CREATE TABLE, though.
Making every identifier (especially column name) lowercase 
is not an option, because df$aa and df$Aa is distinguishable in R. 

Several other test scripts creates rockdata
and tries to remove "rockData" at the end, which results in
remaining rockdata.

Original comment by tomoa...@kenroku.kanazawa-u.ac.jp on 12 Sep 2010 at 3:04

Attachments:

GoogleCodeExporter commented 9 years ago
Further change on the test program.
Use cat() instead of print().
With print(), the output is printed as vector, preceding [1] and quoting...

Related discussion will be sent to the discussion group soon.
Assuming attachment is easier from here.

Original comment by tomoa...@kenroku.kanazawa-u.ac.jp on 12 Sep 2010 at 5:11

Attachments:

GoogleCodeExporter commented 9 years ago
Reassigned for disposition.

Original comment by ne...@neiltiffin.com on 13 Oct 2010 at 12:52

GoogleCodeExporter commented 9 years ago
dbExistsTable, dbWriteTable, and dbRemoveTable work with the character string 
as is by using proper escaping / quoting as of r152.

Original comment by tomoa...@kenroku.kanazawa-u.ac.jp on 14 Oct 2010 at 2:30