Foreign key linting - Githubissues

johlo commented 5 years ago

What needs to be resolved before a PR to upstream

[ ] Unit tests
[ ] Comments
[ ] Code assumes Innodb, how to handle
[ ] Integration tests?
[ ] Referenced keys should be both non-unique and non-null?
[ ] createFullKey should be createFullInnodbKey and moved to tengo?
[ ] Add warning about adding index when missing from child table

Future stuff

[ ] detect >15 levels of cascading FK's and cascades traversing the same table more than once (https://dev.mysql.com/doc/mysql-reslimits-excerpt/8.0/en/ansi-diff-foreign-keys.html)

johlo commented 5 years ago

@evanelias would you be interested in a PR for the FK detectors in https://github.com/johlo/skeema/commit/ea96a9fadb5c0474ac47456e4bf989aefaf9178e ?

The check for references to non-unique is based on the warning in https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html (see the section on Foreign Keys and the ANSI/ISO SQL Standard).

evanelias commented 5 years ago

Awesome, thanks for working on this! I'd very happily welcome a PR. It might be a few weeks until I could give a solid code review though -- I'm aiming to finally launch the skeema.io CI system (basically an auto-linter for github PRs) next week, so might have my hands full temporarily :)

Also as you noticed, there isn't currently a mechanism for the linter to examine cross-database FKs. That probably isn't fixable in the near-term, since it's a side-effect of Skeema's operation model -- it's only ever linting one schema at a time, since it's effectively materializing (to a temporary location) one directory's logical schema at a time, i.e. whatever is expressed in that dir's .sql files. Unlike other commands, skeema lint doesn't really look at what tables/objects/etc currently exist on a db, since it's intentionally only linting the proposed state* as expressed in the filesystem representation.

In theory, it could be possible to change that to materialize multiple schemas/dirs at once, but it would be a large refactor -- and it has some complexities around performance edge cases (users with a very high number of databases/dirs) as well as operational configuration (users placing particular grants on _skeema_tmp vs having an unbounded number of workspace schemas).

On another topic, I wanted to mention I started work on a redundant index detector in https://github.com/skeema/skeema/commit/8e7a4fdd02d70ec307fc67482a9d309d0cc46302. I just wanted to give you a heads-up to avoid risk of duplicate effort, since you previously mentioned interest in that in https://github.com/johlo/skeema/commit/037cf95e187271f535526220928909cd9dee27a4#commitcomment-33493943.

johlo commented 5 years ago

OK! I'll prepare a PR during the week. I was thinking about how to load information for a table from another schema on-demand, but it might be a big change.

I did implement a basic detector for redundant indexes (not pushed to github), I'll have a look at your implementation and see if I can contribute something :)

Will be interesting to the see the CI service!

evanelias commented 5 years ago

Amazing, thank you 👍

btw, I'm going to aim to make the integration testing for linter problem detectors easier in a few weeks. The current setup is a bit clumsy and overly tied to the current set of detectors. Ideally I'm going to aim for a framework where you just supply a test .sql file with specially-formatted inline comments that indicate which problem(s) are expected to be found on each line of the file.

So if you want, feel free to hold off on the integration testing in your PR for now :)

johlo commented 5 years ago

A question @evanelias. I added a pass for checking if there is an explicitly defined index that covers the foreign key (in the child table), the idea is that it would be useful to get a warning about innodb adding an index automatically. But this pass won't discover this since the key has already been added when the detector runs. Also, it is a bit surprising that running lint command will change the schema (the new index has been added to the table in the local file representation).

Is there any way of disabling that behaviour in innodb?

evanelias commented 5 years ago

the idea is that it would be useful to get a warning about innodb adding an index automatically

I'd expect some users are ok with the automatic index creation, since it's a known MySQL behavior. No objections to adding a detector for this, but it probably shouldn't be enabled by default (e.g. shouldn't appear in the default value for warnings).

I'd also imagine the implementation for this would require parsing the CREATE TABLE in complex ways, which can get painful...

Is there any way of disabling that behaviour in innodb?

Not that I know of. InnoDB requires indexes on both sides of the foreign key; otherwise, DML operations would require full table scans. I believe you'll even get an error if you try dropping the index on either side if it's needed for a foreign key.

Also, it is a bit surprising that running lint command will change the schema (the new index has been added to the table in the local file representation).

Sorry about that -- it's a known problem that I plan to fix in 1.3.x. Currently skeema lint always reformats CREATE TABLE statements to their canonical version from SHOW CREATE TABLE, with no way to disable this behavior. In 1.3.x I plan to make a separate skeema format command which only handles formatting, and have skeema lint only optionally reformat (probably defaulting to false, but I'm still considering this).

Another related problem is the current order-of-operations: skeema lint performs problem detection on a schema; provides output; and then reformats files where necessary. This means the line numbers in the annotations may not be correct with regard to the reformatted file. So in 1.3.x I will likely make reformatting (if enabled) come first. That may be problematic for this detector though, if reformatting is enabled.

johlo commented 5 years ago

I'd expect some users are ok with the automatic index creation, since it's a known MySQL behavior. No objections to adding a detector for this, but it probably shouldn't be enabled by default (e.g. shouldn't appear in the default value for warnings).

Ok, I removed this detector as it would be too much trouble for questionable value :)

In 1.3.x I plan to make a separate skeema format command which only handles formatting, and have skeema lint only optionally reformat (probably defaulting to false, but I'm still considering this).

Sounds good!

I can think of a couple of other FK detectors that can be implemented, but I'll wrap up this PR and see what you think of it.

evanelias commented 5 years ago

Sounds great, thanks again! btw you can see the CI system in action at https://www.skeema.io/ci/, check it out if you have a chance :)

johlo commented 5 years ago

PR opened, congrats on launching the CI system! For work we recently moved to Gitlab, any plans for integration?

evanelias commented 5 years ago

Thanks! No concrete Gitlab plans yet, but it's possible in the future.

johlo / skeema

Foreign key linting #1