orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.73k stars 871 forks source link

[PROPOSAL] - make all the identifiers case sensitive in v 3.0 #6052

Closed luigidellaquila closed 3 years ago

luigidellaquila commented 8 years ago

In v 2.x there are some incoherences in how OrientDB manages case sensitivity of identifiers. Two examples:

I propose, in v 3.0, to make all the identifiers case sensitive. Identifiers include:

smolinari commented 8 years ago

:+1: Without consistency, intuitiveness is unreachable. :smile:

Scott

lvca commented 8 years ago

I vote for the opposite: let's make fields case insensitive, so everything is coherent and is back compatibile. The only things it wouldn't work if somebody created properties 'name' and 'Name' on the same record, but this is not common.

Also identifiers are case insensitive in many SQL grammars too. The RDBMS user is expecting this.

luigidellaquila commented 8 years ago

@lvca different RDBMS implementations have different behaviors on this. Let me add some reasons to have case sensitive implementation:

luigidellaquila commented 8 years ago

btw, case insensitive field names would not be backward compatible

smolinari commented 8 years ago

In Mysql the table names and aliases are case sensitive, because the Linux file system is case sensitive. On Windows and OSX (with HFS), they aren't. Column/ Field names are case insensitive. http://dev.mysql.com/doc/refman/5.7/en/identifier-case-sensitivity.html

In PostgreSQL, the identifiers are automatically reduced to lower case, unless they are in double quotes. So, a semi-forced case sensitivity. https://le-gall.net/pierrick/blog/index.php/2006/12/01/90-postgresql-and-case-sensitivity

MSSQL is partially case sensitive. You can't mix cases when creating, but can for the RUD commands. I think I explained that right. https://msdn.microsoft.com/en-us/library/aa290089%28v=vs.71%29.aspx

So, summa summarum, the programmer won't be expecting case insensitivity necessarily. But for sure, he or she would very much expect consistency, which case sensitivity offers. Also, it means we devs don't have to add quotes to queries to be case sensitive. It also means user land must hold up (and can use) a particular case type for their identifier naming convention https://danielmiessler.com/blog/a-list-of-different-case-types/ which I really like. :smile:

Scott

tglman commented 8 years ago

Hi,

I'm for the case sensitive, also because the other protocol we use appart sql are case sensitive (Json,our binary protocol).

In general case sensitive make the engine simpler and faster, on the other side the IDE (studio) should give suggestion in case is defined a property with differnet case of the one in the sql.

seeden commented 8 years ago

+1 for case sensitive

a-unite commented 8 years ago

Just to add even more fuel to the flame:

Also identifiers are case insensitive in many SQL grammars too. The RDBMS user is expecting this.

I'm for Luca's approach here :) And Postgres way (see @smolinari's comment) feels as most common sensetive.

luigidellaquila commented 8 years ago

:D It's a hard decision to take, so all the fuel is welcome

Thanks

Luigi

w3aponx commented 8 years ago

+1 for case-sensitive, for all the reasons outlined already.

Mario

On Wed, Apr 27, 2016 at 10:47 AM, luigidellaquila notifications@github.com wrote:

:D It's a hard decision to take, so all the fuel is welcome

Thanks

Luigi

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/orientechnologies/orientdb/issues/6052#issuecomment-215107479


Mario Cormier

smolinari commented 8 years ago

@a-unite - How do you see the semi-forced case sensitivity, as PostreSQL does it, as the best way? What are the advantages and disadvantages?

The only thing I can think of is the ability to use spaces in identifiers with double quotes, which is still possible to do with ODB too. So, ODB could also be completely case sensitive without quotes, and only need double quotes for the spaces in identifiers.

What I personally don't like about the PostgreSQL method is the forcing of all identifiers (without double quotes) to lower case. You send in aClassWithALongNameWhichYouCanRead and get back aclasswithalongnamewhichyoucantread (yeah, PostgreSQL won't add the "t" :smile: ). That's a problem and limits us as developers. It "dumbs down" the API.

Scott

a-unite commented 8 years ago

@smolinari, My main reason is: If ODB is going to be popular - then we have to provide most convenient way for ordinal users, not only for advanced and highly professional ;) geeks as we are. This is especially important since we don't have and will not have any possibility to signal user (in schema-less mode), that he has empty result set not because there is no data for his query, but because he wrote one or whatever letters (i.e. AClassWithALongNameWhichYouCanRead.Property) in wrong case.

About identifiers returned in low-case when select query executed - I agree, this is not so good. Sorry, I must misunderstood your comment and didn't check actual Postgres behavior (already forgot how it worked). But I guess, we could store two versions of identifiers - in low case for system use (if it is chipper than conversion and to use them as file names, so there are no problems in transition between operating systems) and actual (as is) version to return for humans to read.

There is still the problem with case conversion if non Latin letters used for identifiers, but I believe, again - it is better to strict them with Latin only symbols, to prevent any possible problems with OS file systems support.

Regards, Ata

smolinari commented 8 years ago

Thanks for the explanation @a-unite.

Scott

G4br13l3 commented 8 years ago

+1 for case sensitive for all the reasons reported above and because it allows a better management of unstructured data.

saeedtabrizi commented 8 years ago

Hi, :+1: for case sensitive . this is orientdb in the new age of databases absolutely :)

robfrank commented 8 years ago

👍 for case sensitive everywhere.

smolinari commented 8 years ago

Looks like Case Sensitivity is winning! :smile:

Scott

lvca commented 8 years ago

Guys, I'd rather prefer looking at the big picture here. Changing this means that all the current users could have to rewrite most of their queries if they want to upgrade to v3.0 and in some application this could be really expensive. Maybe users won't upgrade just for this reason.

Take a look at the biggest DBMS players:

This change is not about resolving a critical bug or increasing the performance by 2x.

I'm against introducing this big break in compatibility, just for the sake of being more "coherent" (coherent with what?). I can be coherent with the syntax if everything is declared and documented. There is no need to be "uniform" (this is a better word to represent the intent) across all the names in every section of OrientDB and plugins.

luigidellaquila commented 8 years ago

Hi Luca,

I understand your point and it makes sense, but let's define an expected behavior for class name equality.

  1. How is a class name stored in the db? I'd say the original name that is defined at creation time.
  2. How is class name equality calculated? The String.equalsIgnoreCase() method does not accept a locale, the toUpperCase() and toLowerCase() can be used with a locale. Which locale? The db locale or the machine locale? What does it happen when you move your DB to another machine or when you manually change the locale?

Thanks

Luigi

lvca commented 8 years ago
  1. Agreed (so like now)

About (2), this is the implementation of Java String:

    public boolean equalsIgnoreCase(String anotherString) {
        return (this == anotherString) ? true
                : (anotherString != null)
                && (anotherString.value.length == value.length)
                && regionMatches(true, 0, anotherString, 0, value.length);
    }

So we could use regionMatches() that supports case insensitivity:

public boolean regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len)

But this doesn't support Locale that it should be always what you defined in the database (everywhere), or if not defined, the default that is the system Locale. Behind Locale there are a lot of edge cases. Look at This and The "turkish case".

andrii0lomakin commented 8 years ago

Hi guys, The only problem is performance, equalsIgnoreCase is very slow. So if we keep case-insensitive behavior, I think we should normalize names of classes, clusters and so on before they will go inside of database engine. The problem now that we do not know whether all those names normalized or not and as a result we have tons of toUppperCase toLowCase, equalsIgnorCase over the system. We should be sure that if I take clasName inside of storage it always in lower case for example.

Also, I support Luca because the implementation of the case sensitive approach will be a disaster for our existing users.

On Thu, Jul 7, 2016 at 4:55 PM Luca Garulli notifications@github.com wrote:

  1. Agreed (so like now)

About (2), this is the implementation of Java String:

public boolean equalsIgnoreCase(String anotherString) {
    return (this == anotherString) ? true
            : (anotherString != null)
            && (anotherString.value.length == value.length)
            && regionMatches(true, 0, anotherString, 0, value.length);
}

So we could use regionMatches() that supports case insensitivity:

public boolean regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len)

But this doesn't support Locale that it should be always what you defined in the database (everywhere), or if not defined, the default that is the system Locale. Behind Locale there are a lot of edge cases. Look at This http://blog.eyallupu.com/2010/08/four-things-to-remember-about.html and The "turkish case" http://mattryall.net/blog/2009/02/the-infamous-turkish-locale-bug.

— You are receiving this because you are subscribed to this thread.

Reply to this email directly, view it on GitHub https://github.com/orientechnologies/orientdb/issues/6052#issuecomment-231084474, or mute the thread https://github.com/notifications/unsubscribe/AAGaahyK4L4VQv5jI4bV19ZeuyXSEj35ks5qTQU3gaJpZM4IQtZB .

Best regards, Andrey Lomakin, R&D lead. OrientDB Ltd

twitter: @Andrey_Lomakin linkedin: https://ua.linkedin.com/in/andreylomakin blogger: http://andreylomakin.blogspot.com/

a-unite commented 8 years ago

@laa, exactly! My proposal was to store low case class names in some new structure (this means new database format though), but progressively. That means, that every create class or alter class touching class name or superClass name should have store low case name in addition to its original version. Original case version could be used in all query results, though, as it is now.

When read - we should check, if there is such a "lower case" data in the class (means database was imported, or this class was altered/created with recent API version), then we could use this value for quick comparison (still have to make provided query value lowercased, of course). By version 3.0 we could switch database version completely, making sure, that DB was re-imported and classes|clusters all have the needed (system) lowcase field.

@luigidellaquila about the speed, I'm still wonder about the next case: let's suppose our file system is case insensitive (it is easy - try to make folder "name" and folder "Name" in the same place in Windows) - then we will have to store clusters and classes in some special way to distinguish them. Lets say name.cpm and Name_.cpm respectively. Then you either will need to store that mapping somewhere (and lookup for it every time), or resolve that problem with some other quite complicated (if not expensive) method.

tglman commented 8 years ago

well i think we could separe a bit the sections, i know that have only one behaviour would be better, but not sure if is doable.

The most important case for performance and consistency overall is the "properties" also called "field names" As today we have this beaviour: Schema properties are case insensitive document fields are case sensitive sql fields are case sensitive

to unform the not consistent current behaviour we have two way, make everything case sensitive or case insesitive.

case insensitive: Advantages:

case sensitive: Advantages:

for the properties/fields it seems more convinient to do case sensitive.

the other cases as far as i got does not make much difference for case sensistive and insensitive, you have equal (opinion based)vantages on usability on case insensitive, no huge performance differences, and some small issues based on the locale.

so from what i listed here and from my point of view is better have case sensitive properties, and maybe does really matter the rest. It may only have meaning keep just one way to work to be more user friendly.

my 2 cents bye

tglman commented 8 years ago

hi @a-unite,

I understand your point, you said the engine should work lower case (or case sensitive) and should show the result to the user as the case was configured with.

well i'm not sure we want to keep both version of the field name in the database, and the mapping later on may be expensive, due to all "SchemaLess" cases.

andrii0lomakin commented 8 years ago

Hi guys, If it is related only to fields and because most of the areas use case sensitive fields IMHO better to make them case sensitive.

On Thu, Jul 7, 2016 at 5:56 PM tglman notifications@github.com wrote:

hi @a-unite https://github.com/a-unite,

I understand your point, you said the engine should work lower case (or case sensitive) and should show the result to the user as the case was configured with.

well i'm not sure we want to keep both version of the field name in the database, and the mapping later on may be expensive, due to all "SchemaLess" cases.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/orientechnologies/orientdb/issues/6052#issuecomment-231103232, or mute the thread https://github.com/notifications/unsubscribe/AAGaak9KpHV1f9SvbcVAFau6JBUedIf5ks5qTROkgaJpZM4IQtZB .

Best regards, Andrey Lomakin, R&D lead. OrientDB Ltd

twitter: @Andrey_Lomakin linkedin: https://ua.linkedin.com/in/andreylomakin blogger: http://andreylomakin.blogspot.com/

a-unite commented 8 years ago

@tglman, thanks. You are right - I'd prefer to have case insensitive case for fields too (I see too often issues here, which are related to mistyping or misunderstanding of case sensitivity for field names).

But as I understand, we are discussing now class\cluster names only and I agree, that for field names we have to count on back compatibility anyway.

Edit: sorry, we are discussing "coherency" actually. So this (sensitive or insensitive in in()/out()) should be addressed anyway:

class names are case insensitive everywhere, except as out(), in()... arguments

smolinari commented 8 years ago

Changing this means that all the current users could have to rewrite most of their queries if they want to upgrade to v3.0 and in some application this could be really expensive.

This is assuming that all of the current users change the case of their class name identifiers, when they write their queries. Why would they do that? I can't say it is completely false, but I can say, it probably isn't completely true that they ALL do that. So, I'd say going to case sensitivity for classes won't mean all the users have to rewrite all of their queries. It will be more like some of the users will have to correct their misspellings on some of their queries. That doesn't sound as painful as you put it. :wink:

This might get a grumble from you guys, but couldn't we do it like MySQL and have a global setting for what is wanted by the user? That way, all bases are covered.

From the Mysql docs: Value Meaning 0 Table and database names are stored on disk using the lettercase specified in the CREATE TABLE or CREATE DATABASE statement. Name comparisons are case sensitive. You should not set this variable to 0 if you are running MySQL on a system that has case-insensitive file names (such as Windows or OS X). If you force this variable to 0 with --lower-case-table-names=0 on a case-insensitive file system and access MyISAM tablenames using different lettercases, index corruption may result. 1 Table names are stored in lowercase on disk and name comparisons are not case sensitive. MySQL converts all table names to lowercase on storage and lookup. This behavior also applies to database names and table aliases. 2 Table and database names are stored on disk using the lettercase specified in the CREATE TABLE or CREATE DATABASE statement, but MySQL converts them to lowercase on lookup. Name comparisons are not case sensitive. This works only on file systems that are not case sensitive! InnoDB table names are stored in lowercase, as for lower_case_table_names=1.

Scott

smolinari commented 7 years ago

I was going through the top most reacted posts for ODB and this was one of them and I'd like to pull back my suggestion above. Don't do it the MySQL way, as it only adds way too much complexity.

I am still for case sensitivity and I feel backwards compatibility won't or rather shouldn't be a concern. We are now talking 3.0, which is supposed to be breaking old ways to do them better. I feel going full case sensitive is a move forward, as can be seen by the majority of sentiment shown in this thread.

I'll throw in one other reason, case sensitivity is necessary to be able to support full NoSQL. As someone else mentioned, JSON is case sensitive. I really feel ODB needs to break with the RDBMS world as much as possible. Doing too many things like RDBMSes in an effort to attract people stuck in that paradigm is bending over backwards for all the wrong reasons. It holds ODB back from doing smarter things, if you ask me. ODB needs to be a leader in the NoSQL fold, but doing relationships much better the rest of them. That is the real trick to winning hearts like mine. 😄

Scott