skeema / tengo

Go La Tengo: a MySQL automation library
Apache License 2.0
27 stars 18 forks source link

Fixes issues with capital letters and spaces #3

Closed chrisjpalmer closed 7 years ago

chrisjpalmer commented 7 years ago

Hi Evan

Fix 1: I realised that my Definition() was slightly off and left excess spaces at times. This caused the incorrect DDL unsupported error.

Fix 2 I also came across a very interesting problem to do with the way MySQL handles capital letters. If you use theALTER TABLE statement manually to add a foreign index, you can place in that statement any possible capitalisation of the target schema name. This poses a problem if the schema name that MySQL stores differs in its capitalisation.

The result is the unsupported DDL error.

I know that MySQL doesn't care about capital letters - they don't make something unique. However in naming conventions they are very important. I think what we need to do is maintain the user's desired capitalisation BUT make all comparison statements case-insensitive. I have suggested two ways to implement this in the PR but I suspect there are many more places; for instance in every Constraint.Equals(), Index.Equals(), Column.Equals() etc.

I would be interested to hear your opinion of this.

Kind Regards

Chris

evanelias commented 7 years ago

The first fix looks good, thanks! This is consistent with how spacing is handled in other places in the codebase.

Pretty sure the second fix won't be necessary -- after digging into this more, I think the behavior you're seeing is, confusingly, just a side-effect of Skeema / Tengo operating with foreign_key_checks=0. Database names are actually case-sensitive in MySQL, at least with a case-sensitive file system, which is almost always the case on Linux. However with foreign_key_checks=0, you can create a foreign key that refers to a completely nonexistent database, table, and/or column. MySQL won't throw an error until you try inserting data into the table with foreign_key_checks=1 ;) So I'm pretty sure that's what you were seeing here, if you tried creating a FK that used a database name with different capitalization.

On a related topic, I thought about whether Skeema can use foreign_key_checks=1 in some cases. It's actually quite tricky, since doing so would require Skeema to figure out the table dependency chain for foreign keys, and then create tables in that specific order in _skeema_tmp. Ditto with applying alters in both _skeema_tmp and in real databases too. It's doable in theory, but due to the complexity, probably not something I'd tackle soon.

chrisjpalmer commented 7 years ago

Hi Evan

After some research I agree that Fix 2 is problematic and I will remove it in the next commit. However there is still an issue which has nothing to do with the capitalisation of the schema in the file system BUT just its capitalisation in the KEY_COLUMN_USAGE table. This is what I found.

I am using MacOS Sierra with MAMP 4.1 PRO, MySQL 5.6.34

Observations The CONSTRAINT_SCHEMA column refers to the database of the table in which the foreign key lives. All of the capitalisation seems to be the same.

screen shot 2017-08-08 at 8 05 11 am

Then I looked at this column:

screen shot 2017-08-08 at 8 05 05 am

The REFERENCED_TABLE_SCHEMA column refers to the database of the table which the foreign key points to. As you can see the capitalisation is mixed. Uppercase and lowercase??

MySQL is usually case sensitive and considers xxx and XXX to be different. However on Mac, the file system does not support xxx and XXX existing in the same folder. Therefore MySQL on a Mac does not let you create schema XXX and schema xxx. Neither does it let you create tables in the same way. This behaviour is controlled by a system variable called lower_case_table_names (https://dev.mysql.com/doc/refman/5.7/en/identifier-case-sensitivity.html). I have mistakenly believed for a while that MySQL didn't care about case but that's just because, on a Mac, it permits any capitalisation of a schema in a query, seeing as the capitalisation is insignificant.

But I don't think this phenomenon is related to the above issue. It seems to be that this particular column in the KEY_COLUMN_USAGE table is playing up. Perhaps my version of MySQL is old and this has been fixed in a more recent version. Would be very interested on your thoughts on this.

Why its a problem This was throwing skeema out the other day because when it compared to see whether rawConstraint.ReferencedTableName != s.Name, s.Name was "SCHEMA_A" and rawConstraint.ReferencedTableName was "schema_a". Therefore with my new code, Skeema believed that the constraint referenced a different schema than the current one (s.Name).

screen shot 2017-08-08 at 8 51 13 am

Now if the referenced schema is different, Skeema prepends the referenced schema to the referenced table in the constraint definition.

screen shot 2017-08-08 at 8 27 06 am

As they are actually the same schema in MySQL, the result of s.instance.ShowCreateTable(s, t) does not reflect this.

screen shot 2017-08-08 at 8 26 19 am
evanelias commented 7 years ago

Interesting finding! Unfortunately it isn't feasible for Skeema to fully support edge cases of case-insensitive filesystems / non-Linux operating systems on the database side at this time though, as it's pretty rare to run mysql-server in production on one. Even just on Linux alone, the number of combinations of MySQL versions and vendors can make it difficult to setup a thorough test suite :)

For mac development, my personal recommendation is to use Docker for Mac (and optionally its GUI, Kitematic) for running mysql-server locally. This avoids any peculiarities of MySQL for macOS, as it utilizes macOS's native virtualization support.

chrisjpalmer commented 7 years ago

One potential way of getting around this would be to do a check on MySQL's lower_case_table_names variable at the start. This could then affect your comparison operations throughout skeema. However, I agree with you that it is not important at the current time.