willbryant / kitchen_sync

Fast unidirectional synchronization - make or efficiently update a copy of a database, without slow dumping & reloading
MIT License
282 stars 35 forks source link

KS provides alter statements that can't apply when reconciling schemas with mix of standard and unique keys where unique key name sorts lexicographically after standard key names. #117

Closed flux-ricky closed 10 months ago

flux-ricky commented 11 months ago

TLDR; It looks like KS has its schema alteration commands out of order.

I've tried to distill an example I am seeing in our infrastructure to this reproducible script:

#!/usr/bin/env bash

set -euo pipefail

mysql -uroot -e 'DROP DATABASE IF EXISTS ks1; CREATE DATABASE ks1;'
mysql -uroot -e 'DROP DATABASE IF EXISTS ks2; CREATE DATABASE ks2;'

mysql -uroot -D ks1 <<SQL
CREATE TABLE example (
  id int(11) NOT NULL AUTO_INCREMENT,
  done_at datetime DEFAULT NULL,
  uuid varchar(255) DEFAULT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY index_example_on_uuid (uuid),
  KEY index_example_on_done_at (done_at)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;
SQL

mysql -uroot -D ks2 <<SQL
CREATE TABLE example (
  id int(11) NOT NULL AUTO_INCREMENT,
  done_at datetime DEFAULT NULL,
  uuid varchar(255) DEFAULT NULL,
  PRIMARY KEY (id),
  KEY index_example_on_done_at (done_at)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;
SQL

ks --verbose true --from=mysql://root@localhost/ks1 --to=mysql://root@localhost/ks2 --structure-only $@

Note that the only difference in the schema is that the ks2.example table is missing the UNIQUE KEY index_example_on_uuid (uuid).

When running this script we are provided with the following statements:

Kitchen Sync
The database schema doesn't match.  Use the --alter option if you would like to automatically apply the following schema changes:

CREATE INDEX `index_example_on_done_at` ON `example` (`done_at`);
CREATE UNIQUE INDEX `index_example_on_uuid` ON `example` (`uuid`);
ALTER TABLE `example` DROP INDEX `index_example_on_done_at`;

Database schema needs migration.
Kitchen Syncing failed.

KS wants to first create an index that already exists, add the missing index, and then drop the index it tried to create in an earlier step. Needless to say this fails at the first statement, due to index_example_on_done_at already existing.

At first glance it looks like maybe the first and last statements are in the wrong order and swapping them would allow for the right order of statements.

However, I also note that running only the CREATE UNIQUE INDEX `index_example_on_uuid` ON `example` (`uuid`); statement also moves the target table into the right structure.

An observation while debugging this is that adding unique indexes to a table will place the unique index in the show create table output above other non unique indexes.

So if this order of keys is consistent enough across engine versions, KS could use fewer statements and hopefully limit the amount of index rebuilds due to dropping and adding again.


This is using the latest version 2.18, I went as far back as version 2.15 to see how recent this issue is and I could still see the same behaviour above on that version as well.

flux-ricky commented 11 months ago

Apologies for not taking this any further to reproduce it in the test suite but this was the easiest way to get something reproducible and hopefully highlights the issue. And I haven't dug into the code yet to see if I can spot the problem myself as I wanted to get the issue created first.

willbryant commented 10 months ago

Awesome work on this, thanks @flux-ricky.

Sorry it took me a while to get back to you, I had to do some digging. The root issue was the key sorting order that I implemented, which didn't match the algorithm that I wrote :sob:. I think the solution you submitted was on the right track but it would fail in other cases because it still had the underlying assumption that the indexes should be expected to be in the right order by name, as did my buggy version.

So after trying to write the corrected algorithm that looks at key_type and then name, and having trouble convincing myself that it was correct, in the end I just changed the sort order - which is ultimately arbitrary.

Hopefully 7f215b4 sorts things for you too.

Thanks for your help!

flux-ricky commented 10 months ago

Thanks @willbryant!

I tried running the new version against the example script in this issue and while KS is able to reconcile the keys correctly, it ends up doing more work than it needs to, it recreates the index_example_on_done_at key.

The updated algorithm appears to first drop into this condition:

else if (from_key != to_key) {
    // recreate the index.  not all databases can combine these two statements, so we implement the general case only for now.
    DropKeyStatements<DatabaseClient>::generate(client, from_table, *to_key, append_to(key_statements));
    CreateKeyStatements<DatabaseClient>::generate(client, from_table, *from_key, append_to(key_statements));
    *to_key++ = *from_key++;
}

And therefore suggests these statements:

CREATE UNIQUE INDEX `index_example_on_uuid` ON `example` (`uuid`);
ALTER TABLE `example` DROP INDEX `index_example_on_done_at`;
CREATE INDEX `index_example_on_done_at` ON `example` (`done_at`);

This is what really stumped me when I was debugging this as I couldn't understand why from_key != to_key was true, when it was comparing the same key with the same attributes.

Honestly, I don't even know how to explain this in C++ terminology but its looks like from_key and to_key are not the bare structs but instances of their iterators, and they have different types of iterators?? Dereferencing in the condition appeared to me, to get the desired logic.

diff --git a/src/schema_matcher.h b/src/schema_matcher.h
index 3f6b93a..b583201 100644
--- a/src/schema_matcher.h
+++ b/src/schema_matcher.h
@@ -482,7 +482,7 @@ struct TableMatcher {
                ++from_key;
                // keep the current to_key and re-evaluate on the next iteration

-           } else if (from_key != to_key) {
+           } else if (*from_key != *to_key) {
                // recreate the index.  not all databases can combine these two statements, so we implement the general case only for now.
                DropKeyStatements<DatabaseClient>::generate(client, from_table, *to_key, append_to(key_statements));
                CreateKeyStatements<DatabaseClient>::generate(client, from_table, *from_key, append_to(key_statements));

This patch passes the tests for me as well but also should result in less work done by kitchen sync. It suggests only this schema change

CREATE UNIQUE INDEX `index_example_on_uuid` ON `example` (`uuid`);

Thanks again for your responsiveness and work on KS 🥳


Side note: Thanks for cutting a new version for v2.19, I did notice however that the version reported by ks --version was still v2.18

willbryant commented 10 months ago

Ah damn yeah, it should be *from_key != *to_key. Can't believe that never failed any tests :joy:

willbryant commented 10 months ago

Ahh it doesn't fail tests because it does recreate the index and that produces the right end result (but dumb). I guess I'll just have to commit the fix without a failing test.

flux-ricky commented 10 months ago

Thanks @willbryant !! 🥳