Names of identifiers in DB need to be made valid

ghost commented 20 years ago

Reported by @stefanb on 22 Jul 2003 11:37 UTC Error messages are slightly improved in manageIndexes() after fixing bug 4, so ant test surfaced previously unseen (but present) problem: [java] WARNING: Problem adding INDEX on test.Person#militarySucksAndEverybodyKnowsItButDoesNotSpeakOutLoudAboutIt (A very long fieldname): Invalid argument value, message from server: "Identifier name 'test_Person_militarySucksAndEverybodyKnowsItButDoesNotSpeakOutLoudAboutIt ' is too long"

which calls for improvement in getDBIndexName() which make sure db name is valid

Migrated-From: http://trac.makumba.org/ticket/516

ghost commented 20 years ago

Comment by @cristianbogdan on 26 Aug 2003 13:48 UTC we decided with Stefan on ICQ that it's important to have a clear corespondence between indexes and their db-level names. so this bug became more important

i think putting on makumba this kind of burdens like 40-char long fieldnames and trying to make that work in all databases is cool, but will considerably slow down development :)

each index can have a makmumba-level name (if needed) and a (possibly shorter) db-level name, depending on the db. makumba does exactly the same with table and field names. the rules of making up the db-level names can be altered in the host_dbsystem_dbname.properties file, i guess the same can be done for indexes if needed, though i don't think it will be.

Stefan also suggested to save index info in the catalog, we don't do that for field names (we do it for table names but i don't think we ever use that info)

another solution is to simply use numbers for indexes (index1 for the first index, index2 for the second), then when the db is initialized, see in getIndexInfo if the index exists and is correct, if not, drop the index and recreate it. this technique is especially suited for multiple-field indexes where it is harder to generate a name based on involved fields (while single-field indexe names can be generated with the field name)

finally, instead of ordinal numbers, we could use hashcodes of the field name (or fieldnames) involved in the index. if two hashes collide, the first letters of the field names will certainly make them different. so

fieldN will generate
indexf<hashcode("fieldN")> unique(field1, field3, field10) will generate
indexfff<hashcode("field1, field3, field10")>

or we could use first 2 letters, just to make them easier to read when dealing with problems

ghost commented 20 years ago

Comment by @stefanb on 26 Aug 2003 14:24 UTC In most cases for MySQL database, table and field names are limited to 64 characters. See http://www.mysql.com/doc/en/Legal_names.html

An idea could also be to list all fields in the index, seperated by '_'. If this gets too long (for any reason) index names are equally cut at their end to still accomodate hashcode.

Example if the limit owuld be 20: 1234567890123456789012345 = ruler :) index_name_HASH (name) index_name_surname_HASH (name, surname) index_na_surna_gend_HASH (name, surname, gender)

we could also skip the "index_" in the beginning or make it shorter with no loss. not sure what should happen if there are too many (6 in this example: "index_1_2_3_4_5_6_HASH") fields in the index. Probably this is rare enough, so just hashcode of their concatenation?

ghost commented 20 years ago

Comment by @cristianbogdan on 26 Aug 2003 15:09 UTC sure we can drop "index"

i suggest to start everything with the hashcode (which can be pretty long, and btw, can be negative), maybe in hex so it gets shorter, or even in the notation we use for pointers, then add fieldnames (maybe shortened so that all can fit) and truncate at 64

i don't think a 6-field index is very probable, but the above technique seems to provide for it.

ghost commented 20 years ago

Comment by @cristianbogdan on 26 Aug 2003 16:00 UTC i think all indexes (both single and multiple-fields) should be named according to the policy described here, as that would lead to easier management.

in that case, makumba should know how to drop indexes named in the old style, or they should be dropped manually. (index recreation will have to take place anyway due to fixing of bug 3)

ghost commented 20 years ago

Comment by @stefanb on 26 Aug 2003 16:30 UTC so basically we agree on: extPtr_fn1fn2...hash(alphabeticaly ordered list of (lowercased?) field names)

thi is easy too look for by any criteria extPtr_% %hash ot extPtr%_hash

or we can put human-readable index enumeration at the end of the index name: extPtr_hash(alphabeticaly ordered list of (lowercased?) field names)_fn1_fn2...

ghost commented 20 years ago

Comment by @cristianbogdan on 26 Aug 2003 17:08 UTC what do you mean by extPtr?

i rather referred to Hash represented in the same way as pointers are represented (base 32 or whatever it is). no pointers...

just Hash_fn1_fn2 alphabetically ordered lowercased for windows reasons...

ghost commented 20 years ago

Comment by @cristianbogdan on 26 Aug 2003 17:12 UTC very good idea about alphabetic ordering it should also be applied before computing the hash, i guess

ghost commented 20 years ago

Comment by @stefanb on 26 Aug 2003 20:17 UTC oops, in comment 5 i misunderstood your proposal about base 35 (10 digits + 25 letters) so i assumed you were talking about extPtr (i did realize it later while being offline though :).

ghost commented 20 years ago

Comment by @stefanb on 3 Oct 2003 07:07 UTC As discovered with the latest addition to test suite, doing "select t from test.validMdds. t" fails on some MDDs because of a problems with identifier (column and table) name with message: [3) testQueryValidMdds(test.table)junit.framework.AssertionFailedError: java Tested 5 valid MDDs, of which 2 cant be used for DB queries:[ [java]( ) .1) Error querying valid MDD : [ RuntimeWrappedException: org.makumba.DBError: java.sql.SQLException: Invalid argument value, message from server: "Identifier name 'sdfghjkhgfghjkhhbgvcbnmvcvfghnjmmhgfghjkhgfghjkhgfdghjkytrtyuijnhbvcvbnmhghjkfrghjkfdfghjhgfdfghjtre' is too long", java .2) Error querying valid MDD

: [ RuntimeWrappedException: org.makumba.DBError: java.sql.SQLException: General error, message from server: "Incorrect table name 'test_validMdds_ExtremelyLongMddName12345678901234567890123456789012345678901234567890123456789012345'"](java]) This should probably be solved together with index names problem.

ghost commented 20 years ago

Comment by @stefanb on 16 Oct 2003 21:12 UTC Fixed problems with too long identifier names, described in comment 0 and comment 9 -field name ("fieldnametoolong"->"hashfieldna" -table name ("tablenametoolong"->"hashtablena") -index name (same as field name) shorter identifiers are left as they were. '-' in negative hashcode is replaced with '_' three underscores ("___") are put between hash and leading part of the identifier name because they are not used for anything else yet :)

I also limited default maxFieldNameLength maxTableNameLength to 64 characters for ALL sql engines, but tested only on mysql.

Unique groups of fields (bug 513) will probably need to adopt same principles for index creation once other stuff is designed and done there.

Checking in CHANGELOG.txt; /usr/local/cvsroot/makumba/CHANGELOG.txt,v <-- CHANGELOG.txt new revision: 2.36; previous revision: 2.35 done Checking in classes/org/makumba/db/sql/FieldManager.java; /usr/local/cvsroot/makumba/classes/org/makumba/db/sql/FieldManager.java,v <-- FieldManager.java new revision: 2.9; previous revision: 2.8 done Checking in classes/org/makumba/db/sql/Database.java; /usr/local/cvsroot/makumba/classes/org/makumba/db/sql/Database.java,v <-- Database.java new revision: 2.5; previous revision: 2.4 done

ghost commented 20 years ago

Comment by @stefanb on 10 Feb 2004 18:25 UTC Starting to play with Makumba on Oracle i (after fighting several problems) came across an interesting read about SQL naming conventions: http://dbazine.com/gulutzan5.html It is comparing standard SQL (92,99) table and column name limitations to implementations on major DBMS (DB2, Oracle and Microsoft SQL)

Maybe we should redesign makumba conventions accordingly before such makumba (as implemented now, commment 10) release hits production.

Most notably names should not start with a digit (eg. in hash) or with underscore (eg. if replacing a '-' in hash) or we should start quoting them everywhere.

Any other considerations?

ghost commented 20 years ago

Comment by @stefanb on 16 Feb 2004 11:45 UTC Reverted the order of components in the composed name to: -field name ("fieldnametoolong"->"fieldnahash") -table name ("tablenametoolong"->"tablenahash") -index name (same as field name)

this will also make the DB easier to read. Indexes of unique groups of fields will still be distinguished by hash.

Checking in classes/org/makumba/db/sql/Database.java; /usr/local/cvsroot/makumba/classes/org/makumba/db/sql/Database.java,v <-- Database.java new revision: 2.7; previous revision: 2.6 done

makumba / makumba

Names of identifiers in DB need to be made valid #244