Open simonw opened 3 months ago
This happens because of check_connection()
that we call on every database connection, which calls PRAGMA table_info()
on every table. Which means it fails on virtual tables that don't have their parent extension loaded.
We could ignore these class of errors with:
elif e.args[0].startswith("no such module: "):
pass
But other parts of the codebase will still call PRAGMA table_info()
at other times after connection-time, like table_column_details()
.
We could audit all cases of this and just return None
/[]
when appropriate.
I'm going to create a minimal sqlite-vec
database to commit to our repo for the tests.
Creating it with sqlite-utils
:
sqlite-utils install sqlite-utils-sqlite-vec
echo '{"point": "[1,2,3]"}' | sqlite-utils insert vec.db vectors -
sqlite-utils vec.db 'create virtual table vec_vectors using vec0(point float[3]);'
sqlite-utils vec.db 'insert into vec_vectors(rowid, point) select rowid, point from vectors'
That's a 48KB file:
ls -lah vec.db
-rw-r--r-- 1 simon wheel 48K Aug 15 09:26 vec.db
Datasette cannot currently load it (without its own plugin or the --load-extension
option):
datasette vec.db
Usage: datasette serve [OPTIONS] [FILES]...
Try 'datasette serve --help' for help.
Error: Connection to vec.db failed check: no such module: vec0
This works:
datasette vec.db --load-extension "$(python -c 'import sqlite_vec; print(sqlite_vec.loadable_path())')"
This conflicts slightly with Datasette's existing design with respect to SpatiaLite: currently Datasette shows you a useful error if you try to load a SpatiaLite database without activating the extension:
datasette tests/spatialite.db
Usage: datasette serve [OPTIONS] [FILES]...
Try 'datasette serve --help' for help.
Error: It looks like you're trying to load a SpatiaLite database without first loading the SpatiaLite module.
Try adding the --load-extension=spatialite option.
Read more: https://docs.datasette.io/en/stable/spatialite.html
One solution: turn this into a warning that gets dumped to the console but Datasette still starts up as usual.
Could even expand that a bit, so it knows a few other extensions (like sqlite-vec
) and can output versions of that helpful message for those as well.
Design options:
PRAGMA table_info()
are excluded by Datasette entirely - it's as if they do not existstderr
The second option is nicer, but requires more design decisions: where should that message go? How should we explain it to users?
So I think option 3 is the way to go for the moment.
OK, this is actually a lot of work to implement. I tried this so far:
diff --git a/datasette/utils/__init__.py b/datasette/utils/__init__.py
index 073d6e86..5469252b 100644
--- a/datasette/utils/__init__.py
+++ b/datasette/utils/__init__.py
@@ -19,6 +19,7 @@ import time
import types
import secrets
import shutil
+import sys
from typing import Iterable, List, Tuple
import urllib
import yaml
@@ -978,7 +979,8 @@ def check_connection(conn):
if e.args[0] == "no such module: VirtualSpatialIndex":
raise SpatialiteConnectionProblem(e)
else:
- raise ConnectionProblem(e)
+ print(str(e), file=sys.stderr)
+ # raise ConnectionProblem(e)
class BadMetadataError(Exception):
diff --git a/datasette/utils/internal_db.py b/datasette/utils/internal_db.py
index 626dd137..f5ae68b2 100644
--- a/datasette/utils/internal_db.py
+++ b/datasette/utils/internal_db.py
@@ -1,5 +1,6 @@
+import sys
import textwrap
-from datasette.utils import table_column_details
+from datasette.utils import table_column_details, sqlite3
async def init_internal_db(db):
@@ -137,7 +138,11 @@ async def populate_schema_tables(internal_db, db):
tables_to_insert.append(
(database_name, table_name, table["rootpage"], table["sql"])
)
- columns = table_column_details(conn, table_name)
+ try:
+ columns = table_column_details(conn, table_name)
+ except sqlite3.OperationalError as ex:
+ print(f"Error accessing table {table_name}: {str(ex)}", file=sys.stderr)
+ continue
columns_to_insert.extend(
{
**{"database_name": database_name, "table_name": table_name},
And it's revealing that there are MANY places in Datasette that might attempt to read the columns from one of these tables and run into an error.
For example, running this above patch with datasette tests/vec.db --pdb
I got this traceback on trying to load the homepage:
INFO: Uvicorn running on http://127.0.0.1:8002 (Press CTRL+C to quit)
Error accessing table vec_vectors: no such module: vec0
ERROR: conn=<sqlite3.Connection object at 0x110467140>, sql = 'select count(*) from [vec_vectors]', params = None: no such module: vec0
> /Users/simon/Dropbox/Development/datasette/datasette/utils/__init__.py(640)table_column_details()
-> for r in conn.execute(
(Pdb) c
Traceback (most recent call last):
File "/Users/simon/Dropbox/Development/datasette/datasette/app.py", line 1745, in route_path
response = await view(request, send)
File "/Users/simon/Dropbox/Development/datasette/datasette/views/base.py", line 184, in view
return await self.dispatch_request(request)
File "/Users/simon/Dropbox/Development/datasette/datasette/views/base.py", line 138, in dispatch_request
response = await handler(request)
File "/Users/simon/Dropbox/Development/datasette/datasette/views/index.py", line 68, in get
table_columns = await db.table_columns(table)
File "/Users/simon/Dropbox/Development/datasette/datasette/database.py", line 424, in table_columns
return await self.execute_fn(lambda conn: table_columns(conn, table))
File "/Users/simon/Dropbox/Development/datasette/datasette/database.py", line 283, in execute_fn
return await asyncio.get_event_loop().run_in_executor(
File "/Users/simon/.pyenv/versions/3.10.4/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/simon/Dropbox/Development/datasette/datasette/database.py", line 281, in in_thread
return fn(conn)
File "/Users/simon/Dropbox/Development/datasette/datasette/database.py", line 424, in <lambda>
return await self.execute_fn(lambda conn: table_columns(conn, table))
File "/Users/simon/Dropbox/Development/datasette/datasette/utils/__init__.py", line 632, in table_columns
return [column.name for column in table_column_details(conn, table)]
File "/Users/simon/Dropbox/Development/datasette/datasette/utils/__init__.py", line 640, in table_column_details
for r in conn.execute(
sqlite3.OperationalError: no such module: vec0
I don't think I can fix this for the 1.0a15 alpha:
This would be a bit easier if I moved all code in Datasette that loops through a list of tables to consult the internal database catalog tables for those, then I could include these unreadable tables in that but have a "unreadable = 1" column which could be used to filter them out.
E.g. after https://til.simonwillison.net/sqlite/sqlite-vec my https://til.simonwillison.net/tils.db needs the
vec0
extension if you are going to browse thevec_tils
table, but the rest of the UI should work by skipping that table when scanning tables.