prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.76k stars 5.29k forks source link

Add support for case sensitive identifiers #2863

Open cberner opened 9 years ago

cberner commented 9 years ago
jamiemccrindle commented 9 years ago

+1

electrum commented 9 years ago

I don't think the last one is correct. Case insensitive matching should only happen if the target is the correct case.

You can have multiple identifiers differing only in case. Case insensitive matching should match at most one.

dain commented 9 years ago

@electrum correct. I updated the description.

XiLongZheng commented 8 years ago

Is there any plan to get this fixed so I could use Upper case in table name?

dain commented 8 years ago

I took a look at this a few weeks back. The problem is the change is spread throughout the codebase and in parts that are actively being changed. After @martint's planner changes are in and @haozhun's changes to field dereferencing, the change should be doable.

ashish6976 commented 8 years ago

Is there any other way to use upper case in database name and table name while querying from presto

dain commented 8 years ago

@ashish6976 This issues is about changing the current Presto behavior to support full case sensitive identifiers.

From the users perspective, the current behavior of only supporting case insensitive identifiers should just work, unless you happen to have an existing system containing identifiers that only differ in case. If you are seeing failures when there is only one identifier, please file a new issue since that is a bug.

ashish6976 commented 8 years ago

@dain Here are the scenarios in which I am facing isssues

My MYSQL Server is running on Centos. I am using presto to query my MYSQL database using MYSQL connector where my catalog name is mysql.

Scenario 1 - DataBase name and Table name is combination of upper case and lower case letters

Database Name - TestDB Table Names - EmployeeDetails, EmployeeTable Query 1 - show schemas from mysql; Output - Schema

information_schema performance_schema testdb
(3 rows)

Query 20150818_064410_00003_837eu, FINISHED, 1 node Splits: 2 total, 2 done (100.00%) 0:00 [3 rows, 61B] [25 rows/s, 524B/s]

Query 2 - show tables from mysql.testdb; Output - Table

(0 rows)

Query 20150818_064532_00004_837eu, FINISHED, 1 node Splits: 2 total, 2 done (100.00%) 0:00 [0 rows, 0B] [0 rows/s, 0B/s]

In this case presto is not able to Fetch the table names which are present in database TestDB.

The Mysql output mysql> show tables from TestDB; +------------------+ | Tables_in_TestDB | +------------------+ | EmployeeDetails | | EmployeeTable | +------------------+ 2 rows in set (0.00 sec)

Scenario 2 - DataBase name is in lower case , Table name is combination of upper case and lower case letters

Database Name - lowercasedb Table Names - TableOne, TableTwo Query 1 - show schemas from mysql; Output - Schema

information_schema lowercasedb performance_schema testdb
(4 rows)

Query 20150818_065347_00005_837eu, FINISHED, 1 node Splits: 2 total, 2 done (100.00%) 0:00 [4 rows, 77B] [27 rows/s, 522B/s]

Query 2 - show tables from mysql.lowercasedb; Output - Table

tableone tabletwo (2 rows)

Query 20150818_065432_00006_837eu, FINISHED, 1 node Splits: 2 total, 2 done (100.00%) 0:00 [2 rows, 66B] [15 rows/s, 505B/s]

Query 3 - select * from mysql.lowercasedb.tableone; Output - Query 20150818_065535_00007_837eu failed: Table mysql.lowercasedb.tableone does not exist

In this scenario presto is able to fetch the table names but when I am accessing the table the its giving me an error as shown above.

The Mysql output mysql> select * from lowercasedb.TableOne; +-----------+-----------+ | ColumnOne | ColumnTwo | +-----------+-----------+ | 1 | Row 1 | | 2 | Row 2 | +-----------+-----------+ 2 rows in set (0.00 sec)

Scenario 3 - DataBase name and Table name is in lower case letters

Database Name - lowercasedb Table Names - tableone, tabletwo Query 1 - show schemas from mysql; Output - Schema

information_schema lowercasedb
lowercasetabledb
performance_schema testdb
(5 rows)

Query 20150818_070234_00008_837eu, FINISHED, 1 node Splits: 2 total, 2 done (100.00%) 0:00 [5 rows, 98B] [30 rows/s, 597B/s]

Query 2 - show tables from mysql.lowercasetabledb; Output - Table

tableone tabletwo (2 rows)

Query 20150818_070253_00009_837eu, FINISHED, 1 node Splits: 2 total, 2 done (100.00%) 0:00 [2 rows, 76B] [17 rows/s, 652B/s]

Query 3 - select * from mysql.lowercasetabledb.tableone; Output - columnone | columntwo -----------+----------- 1 | Row 1
2 | Row 2
(2 rows)

Query 20150818_070319_00010_837eu, FINISHED, 1 node Splits: 2 total, 2 done (100.00%) 0:00 [2 rows, 0B] [8 rows/s, 0B/s]

In this scenario I am able to access the tables in the database.

dain commented 8 years ago

@ashish6976 This is a bug in the mysql connector. Please open a new issue for the mysql connector.

saileshmittal commented 8 years ago

Any progress on this?

samanthrao commented 7 years ago

I have the same issue as @ashish6976 but in my case the database is MongoDB. Some of the database names and collections are in title case per company standards. When Presto loads the catalog, it converts everything to lowercase and results in zero records being processed. The only thing that works is, when a collection name or database name is lowercase.

Are there any alternative to overcome this?

umnya commented 7 years ago

Any update about this ?

knoguchi commented 7 years ago

I've read the source a bit. There are places in the Presto code like this https://github.com/prestodb/presto/blob/master/presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java#L49

I think we have to remove those toLowerCase().

img22 commented 7 years ago

Is there a fix for this? I'm not able to query upper case tables

faisal00813 commented 7 years ago

Any update about the fix?

RameshByndoor commented 6 years ago

I am using 0.180 version of presto and issue is still present. We have all our mysql table names in UPPERCASE and any alternative or patches available..?

MichaelAlo commented 6 years ago

Hi there, Is the issue about tables in upper case in MySQL and a Presto connection fixed? :) Cheers

Drizzt321 commented 6 years ago

I'm testing out Presto with MySQL, running into this exact issue. Is there any will, at all, to fix this rather big issue? Clearly there's more than 1 or 2 people, and this would prevent us from using Presto as our tables are managed by Hibernate, which creates them. I have seen https://dev.mysql.com/doc/refman/5.7/en/identifier-case-sensitivity.html, which setting it to 2 might work. I'm having trouble getting my MySQL instance to get that set properly though, but I'll report back if I manage to and if that works.

Drizzt321 commented 6 years ago

So, sadly I can't get it set to '2', I suppose it's because I'm running on Linux, and so mysql really doesn't like it so it keeps reverting to 0. I can get it set to 1, which simply lowercases all table names and stores them on disk as lowercased, but that's not so useful for lots of existing tables.

martint commented 6 years ago

For reference, here are the relevant parts from the SQL spec:

<delimited identifier> ::=
  <double quote> <delimited identifier body> <double quote>

<delimited identifier body> ::=  <delimited identifier part>...
<delimited identifier part> ::=
    <nondoublequote character>
  | <doublequote symbol>

<Unicode delimited identifier> ::=
  U <ampersand> <double quote> <Unicode delimiter body> <double quote>
      <Unicode escape specifier>
<Unicode escape specifier> ::=
  [ UESCAPE <quote> <Unicode escape character> <quote> ]
<Unicode delimiter body> ::=
  <Unicode identifier part>...
<Unicode identifier part> ::=
    <delimited identifier part>
  | <Unicode escape value>
24) For every <identifier body> IB there is exactly one corresponding case-normal form CNF. CNF is an <identifier body> derived from IB as follows:
Let n be the number of characters in IB. For i ranging from 1 (one) to n, the i-th character Mi of IB is transliterated into the corresponding character 
or characters of CNF as follows:
Case:
   a) If Mi is a lower case character or a title case character for which an equivalent upper case sequence U is de ned by Unicode, then let j be th
       e number of characters in U; the next j characters of CNF are U.
   b) Otherwise, the next character of CNF is Mi.
25) The case-normal form of the <identifier body> of a <regular identifier> is used for purposes such as and including determination of identifier 
      equivalence, representation in the Definition and Information Schemas, and representation in diagnostics areas.

...

27) Two <regular identifier>s are equivalent if the case-normal forms of their <identifier body>s, considered as the repetition of a <character string literal> 
that specifies a <character set specification> of SQL_IDENTIFIER and an implementation-defined collation IDC that is sensitive to case, compare equally 
according to the comparison rules in Subclause 8.2, “<comparison predicate>”.

28) A <regular identifier> and a <delimited identifier> are equivalent if the case-normal form of the <identifier body> of the <regular identifier> and the 
<delimited identifier body> of the <delimited identifier> (with all occurrences of <quote> replaced by <quote symbol> and all occurrences of 
<doublequote symbol> replaced by <double quote>), considered as the repetition of a <character string literal> that specifies a <character set specification>
 of SQL_IDENTIFIER and IDC, compare equally according to the comparison rules in Subclause 8.2, “<comparison predicate>”.

29) Two<delimited identifier>s are equivalent if their <delimited identifierbody>s,considered as the repetition of a <character string literal> that specifies
 a <character set specification> of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally according to the
 comparison rules in Subclause 8.2, “<comparison predicate>”.

30) Two <Unicode delimited identifier>s are equivalent if their <Unicode delimiter body>s, considered as the repetition of a <character string literal> that
 specifies a <character set specification> of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally according
 to the comparison rules in Subclause 8.2, “<comparison predicate>”.

31) A <Unicode delimited identifier> and a <delimited identifier> are equivalent if their <Unicode delimiter body> and <delimited identifier body>, 
respectively, each considered as the repetition of a <character string literal> that specifies a <character set specification> of SQL_IDENTIFIER and 
an implementation-defined collation that is sensitive to case, compare equally according to the comparison rules in Subclause 8.2, “<comparison predicate>”.

32) A <regular identifier> and a <Unicode delimited identifier> are equivalent if the case-normal form of the <identifier body> of the <regular identifier> 
and the <Unicode delimiter body> of the <Unicode delimited identifier> considered as the repetition of a <character string literal>, each specifying a
 <character set specification> of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally according to the 
comparison rules in Subclause 8.2, “<comparison predicate>”.
ghost commented 6 years ago

+1 Is there any progess on this? We are useing version 0.192 and the problem with Scenario 2 is still there.

Drizzt321 commented 6 years ago

I have an open PR https://github.com/prestodb/presto/pull/8674 that got stalled waiting for some replies from maintainers, and then I got busy with work and unable to continue it. I don't see any time in the foreseeable future for me to pick this back up.

nihalbtq8 commented 6 years ago

This is still an issue currently. Is there any alternative available?

binque commented 5 years ago

Any update ?

kokosing commented 5 years ago

Any update ?

Unfortunately, no ;(

dain commented 5 years ago

@hgschmie told me the other day he was looking at this

SanjayJosh commented 5 years ago

Any update on this? All the tables in my project are by default in upper case. Or can any pointers be given in the code as to what can be changed so that a custom build for the same can be made?

tooptoop4 commented 5 years ago

any update? hive metastore tables in mysql like DBS, PARTITIONS can't be queried.

presto> select * from mysql.metastore.DBS; Query 20180913_015448_00010_774as failed: line 1:15: Table mysql.metastore.dbs does not exist

ciscoring commented 5 years ago

Any updae on this issue? My collections in mongodb have names with upper case. Presto can't read these collections because it always change lower case.

alvespat commented 5 years ago

I'm getting the same issue with Sqlserver connectors and tables defined in uppercase... ( only be able to query tables where names are defined in lower case.... :-( (presto 0.203)

cberner commented 5 years ago

I no longer am actively working on Presto

mizunno commented 4 years ago

Any update on this? Thanks in advance.

xqliang commented 2 years ago

Fixed at Release 0.225.

JDBC Changes

Match schema and table names case insensitively. This behavior can be enabled by setting the case-insensitive-name-matching catalog configuration option to true.

yys12138 commented 2 years ago

I have a table that that only differ by case,for example ,columns ‘toPkgVErsName ' and ‘ toPkGVersName‘ . I can execute with 'select topkgversname from ......' correctly when i use hive.But the same sql in presto will be wrong (I use hue to connect with hive and presto). and the error message is :

Multiple entries with same key: [topkgversname] required binary topkgversname (STRING)=min: , max: , num_nulls: 0 and [topkgversname] required binary topkgversname (STRING)=min: , max: walleve-1.0.269.171227.eaddc8f, num_nulls: 0 at io.prestosql.jdbc.AbstractPrestoResultSet.resultsException(AbstractPrestoResultSet.java:1731) at io.prestosql.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:216) at io.prestosql.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:176) at io.prestosql.jdbc.$internal.guava.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141) at io.prestosql.jdbc.$internal.guava.collect.AbstractIterator.hasNext(AbstractIterator.java:136) at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811) at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:295) at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:207) at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:162) at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:301) at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) at io.prestosql.jdbc.PrestoResultSet$AsyncIterator.lambda$new$0(PrestoResultSet.java:122) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Is there any setting that could fix this problem?

ebyhr commented 2 years ago

@yys12138 It seems you are using Trino formerly PrestoSQL. The right repository is https://github.com/trinodb/trino and you can ask questions in the community Slack. https://trino.io/slack.html