simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.63k stars 111 forks source link

Table indexes are silently dropped from tables when foreign keys are added #633

Open matdmiller opened 1 month ago

matdmiller commented 1 month ago

After creating a table with an index, if you add a foreign key constraint to the table it drops existing non-related indexes.

I have attached a minimal repro example screenshot and code.

Screenshot 2024-07-20 at 8 05 26 PM

Repro code

from sqlite_utils import Database

from pathlib import Path
db_fp = './db_repro_index.db'
Path(db_fp).unlink(missing_ok=True), Path(f'{db_fp}-shm').unlink(missing_ok=True),Path(f'{db_fp}-wal').unlink(missing_ok=True);

# db = database(db_fp)
db = Database(db_fp)

db['foo'].create(dict(id=int, name=str), pk='id', not_null=['name'])

db['foo'].create_index(['name'], unique=True)
db['foo'].indexes

db['bar'].create(dict(id=int, name=str, foo_id=int), pk='id', not_null=['name'])

db['bar'].create_index(['name'], unique=True)
db['bar'].indexes

db.add_foreign_keys((('bar','foo_id','foo','id'),))

db['bar'].indexes
matdmiller commented 1 month ago

Looks like when adding fk constraints the table is dropped and recreated, but the indexes are not. The tracer functionality is quite helpful for debugging!

Screenshot 2024-07-20 at 9 07 44 PM Screenshot 2024-07-20 at 9 07 18 PM
matdmiller commented 1 month ago

I think it is best to try and preserve existing indexes and fail loudly if they are unable to be recreated automatically. At the very least I believe you should be warned if they are not going to be recreated.

Below is my proposed solution with code:

  1. If a table is kept it will automatically drop the original index. To be able to create the index on the new table using the original CREATE INDEX statement the original index must be dropped. I think it's a reasonable default for the 'kept' table to lose its indexes. I suspect the most common reason to keep the original table is for 'backup' purposes. It would also be difficult to automatically create updated CREATE INDEX statements for the kept renamed table where the statements are more complicated.
  2. If a column exists in the original index but not in the new table, you should have to manually recreate the index as desired. In complicated CREATE INDEX statements it is difficult to automatically replace or remove columns correctly. Failing before any operations have been completed is desired because it allows you to know there will be a problem prior to the SQL statements being executed allowing you to collect the required information required to manually update the indexes prior to any actions taking place as well as not leaving your database in a potentially invalid state.

I have re-tested my original code and the index is maintained. I have also run this additional transformation: db['bar'].transform(rename={'name':'name2'}) which now throws an assertion error:

AssertionError: Index 'idx_bar_name' column 'name' is not in updated table 'bar'. You must manually drop this index prior to running this transformation and manually recreate the new index after running this transformation. The original index sql statement is: `CREATE UNIQUE INDEX [idx_bar_name]
    ON [bar] ([name])`. No changes have been applied to this table.
Screenshot 2024-07-21 at 1 22 24 PM
        # Re-add existing indexes
        for index in self.indexes:
            index_sql = self.db.execute("""SELECT sql FROM sqlite_master 
                                        WHERE type = 'index' AND name = :index_name;""", 
                                        {'index_name':index.name}).fetchall()[0][0]
            if keep_table:
                sqls.append(f"DROP INDEX IF EXISTS [{index.name}];")
            for col in index.columns:
                assert col in copy_from_to.keys() and col == copy_from_to[col], \
                    f"Index '{index.name}' column '{col}' is not in updated table '{self.name}'. " \
                    f"You must manually drop this index prior to running this transformation " \
                    f"and manually recreate the new index after running this transformation. " \
                    f"The original index sql statement is: `{index_sql}`. No changes have been applied to this table."
            sqls.append(index_sql)
matdmiller commented 1 month ago

After running the tests I discovered that auto created pk indexes do not have create index statements and were causing None to be inserted in the list of sqls to run. I have explicitly removed pk indexes from being processed and added an assertion that the returned sql statement is not None and added instructions to manually drop and recreate the index with information about which table and index needs to be addressed.

        # Re-add existing indexes
        for index in self.indexes:
            if index.origin not in ("pk"):
                index_sql = self.db.execute(
                    """SELECT sql FROM sqlite_master WHERE type = 'index' AND name = :index_name;""",
                    {"index_name": index.name},
                ).fetchall()[0][0]
                assert index_sql is not None, (
                    f"Index '{index}' on table '{self.name}' does not have a "
                    "CREATE INDEX statement. You must manually drop this index prior to running this "
                    "transformation and manually recreate the new index after running this transformation."
                )
                if keep_table:
                    sqls.append(f"DROP INDEX IF EXISTS [{index.name}];")
                for col in index.columns:
                    assert col not in rename.keys() and col not in drop, (
                        f"Index '{index.name}' column '{col}' is not in updated table '{self.name}'. "
                        f"You must manually drop this index prior to running this transformation "
                        f"and manually recreate the new index after running this transformation. "
                        f"The original index sql statement is: `{index_sql}`. No changes have been applied to this table."
                    )
                sqls.append(index_sql)

I have confirmed all tests are now passing and have added some additional tests:

@pytest.mark.parametrize(
    "indexes, transform_params",
    [
        ([["name"]], {"types": {"age": str}}),
        ([["name"], ["age", "breed"]], {"types": {"age": str}}),
        ([], {"types": {"age": str}}),
        ([["name"]], {"types": {"age": str}, "keep_table": "old_dogs"}),
    ],
)
def test_transform_indexes(fresh_db, indexes, transform_params):
    dogs = fresh_db["dogs"]
    dogs.insert({"id": 1, "name": "Cleo", "age": 5, "breed": "Labrador"}, pk="id")

    for index in indexes:
        dogs.create_index(index)

    indexes_before_transform = dogs.indexes

    dogs.transform(**transform_params)

    assert sorted(
        [
            {k: v for k, v in idx._asdict().items() if k != "seq"}
            for idx in dogs.indexes
        ],
        key=lambda x: x["name"],
    ) == sorted(
        [
            {k: v for k, v in idx._asdict().items() if k != "seq"}
            for idx in indexes_before_transform
        ],
        key=lambda x: x["name"],
    ), f"Indexes before transform: {indexes_before_transform}\nIndexes after transform: {dogs.indexes}"
    if "keep_table" in transform_params:
        assert all(
            index.origin == "pk"
            for index in fresh_db[transform_params["keep_table"]].indexes
        )

def test_transform_retains_indexes_with_foreign_keys(fresh_db):
    dogs = fresh_db["dogs"]
    owners = fresh_db["owners"]

    dogs.insert({"id": 1, "name": "Cleo", "owner_id": 1}, pk="id")
    owners.insert({"id": 1, "name": "Alice"}, pk="id")

    dogs.create_index(["name"])

    indexes_before_transform = dogs.indexes

    fresh_db.add_foreign_keys([("dogs", "owner_id", "owners", "id")])  # calls transform

    assert sorted(
        [
            {k: v for k, v in idx._asdict().items() if k != "seq"}
            for idx in dogs.indexes
        ],
        key=lambda x: x["name"],
    ) == sorted(
        [
            {k: v for k, v in idx._asdict().items() if k != "seq"}
            for idx in indexes_before_transform
        ],
        key=lambda x: x["name"],
    ), f"Indexes before transform: {indexes_before_transform}\nIndexes after transform: {dogs.indexes}"

@pytest.mark.parametrize(
    "transform_params",
    [
        {"rename": {"age": "dog_age"}},
        {"drop": ["age"]},
    ],
)
def test_transform_with_indexes_errors(fresh_db, transform_params):
    dogs = fresh_db["dogs"]
    dogs.insert({"id": 1, "name": "Cleo", "age": 5}, pk="id")

    dogs.create_index(["name", "age"])

    with pytest.raises(AssertionError) as excinfo:
        dogs.transform(**transform_params)

    assert (
        "Index 'idx_dogs_name_age' column 'age' is not in updated table 'dogs'. "
        "You must manually drop this index prior to running this transformation"
        in str(excinfo.value)
    )

Will be submitting a PR.

matdmiller commented 1 month ago

Here is the PR to resolve this issue with my proposed changes: https://github.com/simonw/sqlite-utils/pull/634