simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.66k stars 111 forks source link

Mechanism for ensuring a table has all the columns #467

Closed simonw closed 2 years ago

simonw commented 2 years ago

Suggested by @jefftriplett on Discord: https://discord.com/channels/823971286308356157/997738192360964156/1011655389063958600

simonw commented 2 years ago

Jeff suggested db[table].(..., alter=True) for this.

    db["urls"].create(
        {
            "url": str,
            "crawled": bool,
            "body": str,
            "headers": dict,
            "status": int,
            "status_text": str,
        },
        pk="url",
        defaults={"crawled": False},
        if_not_exists=True,
        alter=True,
    )
simonw commented 2 years ago

I'm not crazy about having to pass both alter=True and if_not_exists=True - maybe alter should imply if_not_exists.

simonw commented 2 years ago

Should passing alter=True also drop any columns that aren't included in the new table structure?

It could even spot column types that aren't correct and fix those.

Is that consistent with the expectations set by how alter=True works elsewhere?

simonw commented 2 years ago

Could call it ensure=True here if it works differently enough from alter=True that the behavior could be confusing.

simonw commented 2 years ago

Thinking about this more, I think alter=True is a good name for this option even if it does more than the same option on .insert().

simonw commented 2 years ago

Maybe there should be a separate table.alter(...) method that does the actual work here, with .create(..., alter=True) as syntactic sugar for triggering that if the table exists already.

simonw commented 2 years ago

... but that's what the table.transform(...) method does already!

So maybe this is actually a transform=True parameter to create() that triggers table.transform(...) if necessary.

jefftriplett commented 2 years ago

Should passing alter=True also drop any columns that aren't included in the new table structure?

It could even spot column types that aren't correct and fix those.

Is that consistent with the expectations set by how alter=True works elsewhere?

I would lean towards not dropping them (or making a drop=True or drop_columns=Trueor drop_missing_columns=True) to work with existing tables easier.

I do like that sqlite-utils mostly just works with existing tables but it's also nice to add to existing fields in a few cases.

simonw commented 2 years ago

Initial prototype:

diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index 18a442a..03fd345 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -875,6 +875,7 @@ class Database:
         hash_id_columns: Optional[Iterable[str]] = None,
         extracts: Optional[Union[Dict[str, str], List[str]]] = None,
         if_not_exists: bool = False,
+        transform: bool = False,
     ) -> "Table":
         """
         Create a table with the specified name and the specified ``{column_name: type}`` columns.
@@ -892,7 +893,39 @@ class Database:
         :param hash_id_columns: List of columns to be used when calculating the hash ID for a row
         :param extracts: List or dictionary of columns to be extracted during inserts, see :ref:`python_api_extracts`
         :param if_not_exists: Use ``CREATE TABLE IF NOT EXISTS``
-        """
+        :param transform: If table already exists, transform it to fit the specified schema
+        """
+        # Transform table to match the new definition if table already exists:
+        if transform and self[name].exists():
+            # First add missing columns and columns to drop
+            existing_columns = self[name].columns_dict
+            missing_columns = dict(
+                (col_name, col_type)
+                for col_name, col_type in columns.items()
+                if col_name not in existing_columns
+            )
+            columns_to_drop = [
+                column for column in existing_columns if column not in columns
+            ]
+            if missing_columns:
+                for col_name, col_type in missing_columns.items():
+                    self[name].add_column(col_name, col_type)
+            # Do we need to reset the column order?
+            column_order = None
+            if list(existing_columns) != list(columns):
+                column_order = list(columns)
+            # Only run .transform() if there is something to do
+            # TODO: this misses changes like pk= without also column changes
+            if columns_to_drop or missing_columns or column_order:
+                self[name].transform(
+                    types=columns,
+                    drop=columns_to_drop,
+                    column_order=column_order,
+                    not_null=not_null,
+                    defaults=defaults,
+                    pk=pk,
+                )
+            return cast(Table, self[name])
         sql = self.create_table_sql(
             name=name,
             columns=columns,
@@ -1477,6 +1510,7 @@ class Table(Queryable):
         hash_id_columns: Optional[Iterable[str]] = None,
         extracts: Optional[Union[Dict[str, str], List[str]]] = None,
         if_not_exists: bool = False,
+        transform: bool = False,
     ) -> "Table":
         """
         Create a table with the specified columns.
@@ -1508,6 +1542,7 @@ class Table(Queryable):
                 hash_id_columns=hash_id_columns,
                 extracts=extracts,
                 if_not_exists=if_not_exists,
+                transform=transform,
             )
         return self

Needs more thought about how things like just a change to pk= should work.

simonw commented 2 years ago

Example of that prototype working:

>>> from sqlite_utils import Database
>>> db = Database(memory=True)
>>> db["dogs"].create({"id": int, "name": str}, pk="id")
<Table dogs (id, name)>
>>> db["dogs"].create({"id": int, "name": str, "age": int}, pk="id", transform=True)
<Table dogs (id, name, age)>
simonw commented 2 years ago

Also needs comprehensive tests and documentation.

simonw commented 2 years ago

I could add a --transform option to sqlite-utils create-table too.

simonw commented 2 years ago

Documentation: