Open matdmiller opened 1 month ago
Looks like when adding fk constraints the table is dropped and recreated, but the indexes are not. The tracer functionality is quite helpful for debugging!
I think it is best to try and preserve existing indexes and fail loudly if they are unable to be recreated automatically. At the very least I believe you should be warned if they are not going to be recreated.
Below is my proposed solution with code:
CREATE INDEX
statement the original index must be dropped. I think it's a reasonable default for the 'kept' table to lose its indexes. I suspect the most common reason to keep the original table is for 'backup' purposes. It would also be difficult to automatically create updated CREATE INDEX
statements for the kept
renamed table where the statements are more complicated.CREATE INDEX
statements it is difficult to automatically replace or remove columns correctly. Failing before any operations have been completed is desired because it allows you to know there will be a problem prior to the SQL statements being executed allowing you to collect the required information required to manually update the indexes prior to any actions taking place as well as not leaving your database in a potentially invalid state.I have re-tested my original code and the index is maintained. I have also run this additional transformation:
db['bar'].transform(rename={'name':'name2'})
which now throws an assertion error:
AssertionError: Index 'idx_bar_name' column 'name' is not in updated table 'bar'. You must manually drop this index prior to running this transformation and manually recreate the new index after running this transformation. The original index sql statement is: `CREATE UNIQUE INDEX [idx_bar_name]
ON [bar] ([name])`. No changes have been applied to this table.
# Re-add existing indexes
for index in self.indexes:
index_sql = self.db.execute("""SELECT sql FROM sqlite_master
WHERE type = 'index' AND name = :index_name;""",
{'index_name':index.name}).fetchall()[0][0]
if keep_table:
sqls.append(f"DROP INDEX IF EXISTS [{index.name}];")
for col in index.columns:
assert col in copy_from_to.keys() and col == copy_from_to[col], \
f"Index '{index.name}' column '{col}' is not in updated table '{self.name}'. " \
f"You must manually drop this index prior to running this transformation " \
f"and manually recreate the new index after running this transformation. " \
f"The original index sql statement is: `{index_sql}`. No changes have been applied to this table."
sqls.append(index_sql)
After running the tests I discovered that auto created pk
indexes do not have create index
statements and were causing None
to be inserted in the list of sqls
to run. I have explicitly removed pk
indexes from being processed and added an assertion that the returned sql statement is not None and added instructions to manually drop and recreate the index with information about which table and index needs to be addressed.
# Re-add existing indexes
for index in self.indexes:
if index.origin not in ("pk"):
index_sql = self.db.execute(
"""SELECT sql FROM sqlite_master WHERE type = 'index' AND name = :index_name;""",
{"index_name": index.name},
).fetchall()[0][0]
assert index_sql is not None, (
f"Index '{index}' on table '{self.name}' does not have a "
"CREATE INDEX statement. You must manually drop this index prior to running this "
"transformation and manually recreate the new index after running this transformation."
)
if keep_table:
sqls.append(f"DROP INDEX IF EXISTS [{index.name}];")
for col in index.columns:
assert col not in rename.keys() and col not in drop, (
f"Index '{index.name}' column '{col}' is not in updated table '{self.name}'. "
f"You must manually drop this index prior to running this transformation "
f"and manually recreate the new index after running this transformation. "
f"The original index sql statement is: `{index_sql}`. No changes have been applied to this table."
)
sqls.append(index_sql)
I have confirmed all tests are now passing and have added some additional tests:
@pytest.mark.parametrize(
"indexes, transform_params",
[
([["name"]], {"types": {"age": str}}),
([["name"], ["age", "breed"]], {"types": {"age": str}}),
([], {"types": {"age": str}}),
([["name"]], {"types": {"age": str}, "keep_table": "old_dogs"}),
],
)
def test_transform_indexes(fresh_db, indexes, transform_params):
dogs = fresh_db["dogs"]
dogs.insert({"id": 1, "name": "Cleo", "age": 5, "breed": "Labrador"}, pk="id")
for index in indexes:
dogs.create_index(index)
indexes_before_transform = dogs.indexes
dogs.transform(**transform_params)
assert sorted(
[
{k: v for k, v in idx._asdict().items() if k != "seq"}
for idx in dogs.indexes
],
key=lambda x: x["name"],
) == sorted(
[
{k: v for k, v in idx._asdict().items() if k != "seq"}
for idx in indexes_before_transform
],
key=lambda x: x["name"],
), f"Indexes before transform: {indexes_before_transform}\nIndexes after transform: {dogs.indexes}"
if "keep_table" in transform_params:
assert all(
index.origin == "pk"
for index in fresh_db[transform_params["keep_table"]].indexes
)
def test_transform_retains_indexes_with_foreign_keys(fresh_db):
dogs = fresh_db["dogs"]
owners = fresh_db["owners"]
dogs.insert({"id": 1, "name": "Cleo", "owner_id": 1}, pk="id")
owners.insert({"id": 1, "name": "Alice"}, pk="id")
dogs.create_index(["name"])
indexes_before_transform = dogs.indexes
fresh_db.add_foreign_keys([("dogs", "owner_id", "owners", "id")]) # calls transform
assert sorted(
[
{k: v for k, v in idx._asdict().items() if k != "seq"}
for idx in dogs.indexes
],
key=lambda x: x["name"],
) == sorted(
[
{k: v for k, v in idx._asdict().items() if k != "seq"}
for idx in indexes_before_transform
],
key=lambda x: x["name"],
), f"Indexes before transform: {indexes_before_transform}\nIndexes after transform: {dogs.indexes}"
@pytest.mark.parametrize(
"transform_params",
[
{"rename": {"age": "dog_age"}},
{"drop": ["age"]},
],
)
def test_transform_with_indexes_errors(fresh_db, transform_params):
dogs = fresh_db["dogs"]
dogs.insert({"id": 1, "name": "Cleo", "age": 5}, pk="id")
dogs.create_index(["name", "age"])
with pytest.raises(AssertionError) as excinfo:
dogs.transform(**transform_params)
assert (
"Index 'idx_dogs_name_age' column 'age' is not in updated table 'dogs'. "
"You must manually drop this index prior to running this transformation"
in str(excinfo.value)
)
Will be submitting a PR.
Here is the PR to resolve this issue with my proposed changes: https://github.com/simonw/sqlite-utils/pull/634
After creating a table with an index, if you add a foreign key constraint to the table it drops existing non-related indexes.
I have attached a minimal repro example screenshot and code.
Repro code