Closed simonw closed 10 months ago
That's odd, I wrote a test for this just now and it passes already:
def test_transform_preserves_rowids(fresh_db):
# Create a rowid table
fresh_db["places"].insert_all(
(
{"name": "Paris", "country": "France"},
{"name": "London", "country": "UK"},
{"name": "New York", "country": "USA"},
),
)
assert fresh_db["places"].use_rowid
previous_rows = list(
tuple(row) for row in fresh_db.execute("select rowid, name from places")
)
# Transform it
fresh_db["places"].transform(column_order=("country", "name"))
# Should be the same
next_rows = list(
tuple(row) for row in fresh_db.execute("select rowid, name from places")
)
assert previous_rows == next_rows
So maybe I'm wrong about the cause of that bug?
I tried bumping that up to 10,000 rows instead of just 3 but the test still passed.
I just noticed that the table where I encountered this bug wasn't actually a rowid
table after all - it had an id
column that was a text primary key.
The reason the rowid
was important is that's how the FTS mechanism in Datasette relates FTS entries to their rows.
But I tried this test and it passed, too:
def test_transform_preserves_rowids(fresh_db):
fresh_db["places"].insert_all(
[
{"id": "1", "name": "Paris", "country": "France"},
{"id": "2", "name": "London", "country": "UK"},
{"id": "3", "name": "New York", "country": "USA"},
],
pk="id",
)
previous_rows = list(
tuple(row) for row in fresh_db.execute("select rowid, id, name from places")
)
# Transform it
fresh_db["places"].transform(column_order=("country", "name"))
# Should be the same
next_rows = list(
tuple(row) for row in fresh_db.execute("select rowid, id, name from places")
)
assert previous_rows == next_rows
Oh! Maybe the row ID preservation here is a coincidence because the tables are created from scratch and count 1, 2, 3.
If I delete a row from the table and then insert some more - breaking the rowid
sequence - it might show the bug.
Yes! That recreated the bug:
> assert previous_rows == next_rows
E AssertionError: assert equals failed
E [ [
E (1, '1', 'Paris'), (1, '1', 'Paris'),
E (3, '3', 'New York'), (2, '3', 'New York'),
E (4, '4', 'London'), (3, '4', 'London'),
E ] ...
E
In working on this I learned that rowid
values in SQLite are way less stable than I had thought - in particular, they are often entirely rewritten on a VACUUM
:
https://www.sqlite.org/lang_vacuum.html#how_vacuum_works
The VACUUM command may change the ROWIDs of entries in any tables that do not have an explicit INTEGER PRIMARY KEY.
So this fix wasn't as valuable as I thought. I need to move away from ever assuming that a rowid
is a useful foreign key for anything.
I just spotted a bug when using https://datasette.io/plugins/datasette-configure-fts and https://datasette.io/plugins/datasette-edit-schema at the same time.
Steps to reproduce:
I got the wrong search results, which I think is because the
_fts
table pointed to the first table byrowid
but thoserowid
values were entirely rewritten as a consequence of runningtable.transform()
on the table.Reconfiguring FTS on the table fixed the problem.
I think
table.transform()
should be able to preserverowid
values.