simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.58k stars 106 forks source link

.transform() instead of modifying sqlite_master for add_foreign_keys #584

Closed simonw closed 11 months ago

simonw commented 11 months ago

Refs:


:books: Documentation preview :books:: https://sqlite-utils--584.org.readthedocs.build/en/584/

simonw commented 11 months ago

Just one failing test left:

        # Soundness check foreign_keys point to existing tables
        for fk in foreign_keys:
            if fk.other_table == name and columns.get(fk.other_column):
                continue
            if not any(
                c for c in self[fk.other_table].columns if c.name == fk.other_column
            ):
>               raise AlterError(
                    "No such column: {}.{}".format(fk.other_table, fk.other_column)
                )
E               sqlite_utils.db.AlterError: No such column: breeds.rowid

sqlite_utils/db.py:882: AlterError
==== short test summary info ====
FAILED tests/test_create.py::test_add_column_foreign_key - sqlite_utils.db.AlterError: No such column: breeds.rowid
==== 1 failed, 378 deselected in 0.49s ====
simonw commented 11 months ago

Full test:

https://github.com/simonw/sqlite-utils/blob/842b61321fc6a9f0bdb913ab138e39d71bf42e00/tests/test_create.py#L468-L484

simonw commented 11 months ago

Just these three lines recreate the problem:

from sqlite_utils import Database

fresh_db = Database(memory=True)

fresh_db.create_table("dogs", {"name": str})
fresh_db.create_table("breeds", {"name": str})
fresh_db["dogs"].add_column("breed_id", fk="breeds")

Traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/db.py", line 2170, in add_column
    self.add_foreign_key(col_name, fk, fk_col)
  File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/db.py", line 2273, in add_foreign_key
    self.db.add_foreign_keys([(self.name, column, other_table, other_column)])
  File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/db.py", line 1169, in add_foreign_keys
    self[table].transform(add_foreign_keys=fks)
  File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/db.py", line 1728, in transform
    sqls = self.transform_sql(
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/db.py", line 1896, in transform_sql
    self.db.create_table_sql(
  File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/db.py", line 882, in create_table_sql
    raise AlterError(
sqlite_utils.db.AlterError: No such column: breeds.rowid
simonw commented 11 months ago

The problem here is that the table created by this line:

fresh_db.create_table("breeds", {"name": str})

Has this schema:

CREATE TABLE [breeds] (
   [name] TEXT
);

SQLite creates an invisible rowid column for it automatically.

On the main branch with the old implementation that table ends up looking like this after the foreign key has been added:

(Pdb) print(fresh_db.schema)
CREATE TABLE [dogs] (
   [name] TEXT
, [breed_id] INTEGER,
   FOREIGN KEY([breed_id]) REFERENCES [breeds]([rowid])
);
CREATE TABLE [breeds] (
   [name] TEXT
);

But I think this validation check is failing now: https://github.com/simonw/sqlite-utils/blob/842b61321fc6a9f0bdb913ab138e39d71bf42e00/sqlite_utils/db.py#L875-L884

Here's what the debugger reveals about this code:

        for fk in foreign_keys:
            if fk.other_table == name and columns.get(fk.other_column):
                continue
            if not any(
                c for c in self[fk.other_table].columns if c.name == fk.other_column
            ):
                raise AlterError(
                    "No such column: {}.{}".format(fk.other_table, fk.other_column)
                )
(Pdb) fk
ForeignKey(table='dogs', column='breed_id', other_table='breeds', other_column='rowid')
(Pdb) self[fk.other_table].columns
[Column(cid=0, name='name', type='TEXT', notnull=0, default_value=None, is_pk=0)]
codecov[bot] commented 11 months ago

Codecov Report

Patch coverage: 92.85% and project coverage change: -0.07% :warning:

Comparison is base (1dc6b5a) 95.82% compared to head (2915050) 95.76%. Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #584 +/- ## ========================================== - Coverage 95.82% 95.76% -0.07% ========================================== Files 8 8 Lines 2829 2834 +5 ========================================== + Hits 2711 2714 +3 - Misses 118 120 +2 ``` | [Files Changed](https://app.codecov.io/gh/simonw/sqlite-utils/pull/584?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) | Coverage Δ | | |---|---|---| | [sqlite\_utils/db.py](https://app.codecov.io/gh/simonw/sqlite-utils/pull/584?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-c3FsaXRlX3V0aWxzL2RiLnB5) | `97.22% <92.85%> (-0.15%)` | :arrow_down: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

simonw commented 11 months ago

Oops, mypy failures:

sqlite_utils/db.py:781: error: Incompatible types in assignment (expression has type "Tuple[Any, ...]", variable has type "Union[str, ForeignKey, Tuple[str, str], Tuple[str, str, str], Tuple[str, str, str, str]]")  [assignment]
sqlite_utils/db.py:1164: error: Need type annotation for "by_table" (hint: "by_table: Dict[<type>, <type>] = ...")  [var-annotated]
sqlite_utils/db.py:1169: error: Item "View" of "Union[Table, View]" has no attribute "transform"  [union-attr]
sqlite_utils/db.py:1813: error: Argument 1 to "append" of "list" has incompatible type "ForeignKey"; expected <nothing>  [arg-type]
sqlite_utils/db.py:1824: error: Argument 1 to "append" of "list" has incompatible type "ForeignKey"; expected <nothing>  [arg-type]
Found 5 errors in 1 file (checked 56 source files)
simonw commented 11 months ago

The docs still describe the old trick, I need to update that: https://sqlite-utils.datasette.io/en/3.34/python-api.html#adding-foreign-key-constraints

simonw commented 11 months ago

Weird, I'm getting a flake8 problem in CI which doesn't occur on my laptop:

./tests/test_recipes.py:99:9: F811 redefinition of unused 'fn' from line 96
./tests/test_recipes.py:127:9: F811 redefinition of unused 'fn' from line 124
simonw commented 11 months ago

Upgrading flake8 locally replicated the error:

pip install -U flake8
flake8
./tests/test_recipes.py:99:9: F811 redefinition of unused 'fn' from line 96
./tests/test_recipes.py:127:9: F811 redefinition of unused 'fn' from line 124
simonw commented 11 months ago

Another docs update: this bit in here https://sqlite-utils.datasette.io/en/3.34/python-api.html#adding-multiple-foreign-key-constraints-at-once talks about how .add_foreign_keys() is a performance optimization to avoid having to run VACUUM a bunch of separate times:

The final step in adding a new foreign key to a SQLite database is to run VACUUM, to ensure the new foreign key is available in future introspection queries.

VACUUM against a large (multi-GB) database can take several minutes or longer. If you are adding multiple foreign keys using table.add_foreign_key(...) these can quickly add up.

Instead, you can use db.add_foreign_keys(...) to add multiple foreign keys within a single transaction. This method takes a list of four-tuples, each one specifying a table, column, other_table and other_column.

That doesn't apply any more - the new mechanism using .transform() works completely differently, so this issue around running VACUUM no longer applies.

simonw commented 11 months ago

One last piece of documentation: need to document the new option to table.transform() and table.transform_sql():

https://github.com/simonw/sqlite-utils/blob/0771ac61fe5c2aca74075b20b1a99b9bd4c65661/sqlite_utils/db.py#L1706-L1708

I should write tests for them too.

simonw commented 11 months ago

Updated documentation: https://sqlite-utils--584.org.readthedocs.build/en/584/python-api.html#adding-foreign-key-constraints

simonw commented 11 months ago

More updated documentation: