simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.62k stars 109 forks source link

Aliased ROWID option for tables created from alter=True commands #557

Closed chapmanjacobd closed 1 year ago

chapmanjacobd commented 1 year ago

If you use INTEGER PRIMARY KEY column, the VACUUM does not change the values of that column. However, if you use unaliased rowid, the VACUUM command will reset the rowid values.

ROWID should never be used with foreign keys but the simple act of aliasing rowid to id (which is what happens when one does id integer primary key DDL) makes it OK.

It would be convenient if there were more options to use a string column (eg. filepath) as the PK, and be able to use it during upserts, but when creating a foreign key, to create an integer column which aliases rowid

I made an attempt to switch to integer primary keys here but it is not going well... In my usecase the path column is a business key. Yes, it should be as simple as including the id column in any select statement where I plan on using upsert but it would be nice if this could be abstracted away somehow https://github.com/chapmanjacobd/library/commit/788cd125be01d76f0fe2153335d9f6b21db1343c

https://github.com/chapmanjacobd/library/actions/runs/5173602136/jobs/9319024777

chapmanjacobd commented 1 year ago

this isn't really actionable... I'm just being a whiny baby. I have tasted the milk of being able to use upsert_all, insert_all, etc without having to write DDL to create tables. The meat of the issue is that SQLITE doesn't make rowid stable between vacuums so it is not possible to take shortcuts

chapmanjacobd commented 1 year ago

I put together a simple script to upsert and remove duplicate rows based on business keys. If anyone has similar problems with above this might help

CREATE TABLE my_table (
    id INTEGER PRIMARY KEY,
    column1 TEXT,
    column2 TEXT,
    column3 TEXT
);

INSERT INTO my_table (column1, column2, column3)
VALUES
    ('Value 1', 'Duplicate 1', 'Duplicate A'),
    ('Value 2', 'Duplicate 2', 'Duplicate B'),
    ('Value 3', 'Duplicate 2', 'Duplicate C'),
    ('Value 4', 'Duplicate 3', 'Duplicate D'),
    ('Value 5', 'Duplicate 3', 'Duplicate E'),
    ('Value 6', 'Duplicate 3', 'Duplicate F');
library dedupe-db test.db my_table --bk column2