simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.63k stars 111 forks source link

Support `rows_where()`, `delete_where()` etc for attached alias databases #432

Open luxint opened 2 years ago

luxint commented 2 years ago

Hi,

I noticed rows_where() doesn't return any rows from tables which are from attached databases. The exists() function returns false. As far as I can see this is because the table_names() function only looks for table names in the current database and not in attached (or temp) databases.

Besides, rows_where(), also insert_all() and delete_where() didn't do what I was expecting because of this. For the moment I've patched table_names() for myself, see below but I'm not sure what the total impact is on the other functions like lookup truncate etc which all use exists(). Also view_names() doesn't look for views in attached or temp databases.

 def table_names(self, fts4: bool = False, fts5: bool = False) -> List[str]:
        "A list of string table names in this database."
        where = ["type = 'table'"]
        if fts4:
            where.append("sql like '%USING FTS4%'")
        if fts5:
            where.append("sql like '%USING FTS5%'")
        dbs = [x[1] for x in self.execute('pragma database_list').fetchall()]    
        lst=[]
        for db in dbs:    
            sql = "select name from {} where {}".format(db+".sqlite_master"," AND ".join(where))
            lst.extend(r[0] for r in self.execute(sql).fetchall())
        return lst
simonw commented 2 years ago

I don't like the idea of table_names() returning names of tables from connected databases as well, because it feels like it could lead to surprising behaviour - especially if those connected databases turn to have table names that are duplicated in the main connected database.

It would be neat if functions like .rows_where() worked though.

One thought would be to support something like this:

rows = db["otherdb.tablename"].rows_where()

But... . is a valid character in a SQLite table name. So "otherdb.tablename" might ambiguously refer to a table called tablename in a connected database with the alias otherdb, OR a table in the current database with the name otherdb.tablename.

simonw commented 2 years ago

Another potential fix: add a alias= parameter to rows_where() and other similar methods. Then you could do this:

rows = db["tablename"].rows_where(alias="otherdb")

This feels wrong to me: db["tablename"] is the bit that is supposed to return a table object. Having part of what that table object is exist as a parameter to other methods is confusing.

simonw commented 2 years ago

Third option, and I think the one I like the best:

rows = db.table("tablename", alias="otherdb").rows_where(alias="otherdb")

The db.table(tablename) method already exists as an alternative to db[tablename]: https://sqlite-utils.datasette.io/en/stable/python-api.html#python-api-table-configuration

simonw commented 2 years ago

Implementing this would be a pretty big change - initial instinct is that I'd need to introduce a self.alias property to Queryable (the subclass of Table and View) and a new self.name_with_alias getter which returns alias.tablename if alias is set to a not-None value. Then I'd need to rewrite every piece of code like this:

https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/db.py#L1161

To look like this instead:

        sql = "select {} from [{}]".format(select, self.name_with_alias)

But some parts would be harder - for example:

https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/db.py#L1227-L1231

Would have to know to query alias.sqlite_master instead.

The cached table counts logic like this would need a bunch of changes too:

https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/db.py#L644-L657

simonw commented 2 years ago

Initial idea of how the .table() method would change:

diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index 7a06304..3ecb40b 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -474,11 +474,12 @@ class Database:
             self._tracer(sql, None)
         return self.conn.executescript(sql)

-    def table(self, table_name: str, **kwargs) -> Union["Table", "View"]:
+    def table(self, table_name: str, alias: Optional[str] = None, **kwargs) -> Union["Table", "View"]:
         """
         Return a table object, optionally configured with default options.

         :param table_name: Name of the table
+        :param alias: The database alias to use, if referring to a table in another connected database
         """
         klass = View if table_name in self.view_names() else Table
         return klass(self, table_name, **kwargs)