Open simonw opened 1 year ago
GPT-4 suggested:
CREATE TRIGGER people_update_history
AFTER UPDATE ON people
FOR EACH ROW
BEGIN
-- Check if a record for old.rowid exists in the _history table
IF NOT EXISTS (SELECT 1 FROM _people_history WHERE _rowid = old.rowid) THEN
-- If it does not exist, add a record to _people_history with the old values
INSERT INTO _people_history (_rowid, id, name, age, weight, _version, _updated, _mask)
VALUES (old.rowid, old.id, old.name, old.age, old.weight, 0, cast((julianday('now') - 2440587.5) * 86400 * 1000 as integer), 15);
END IF;
-- Proceed with the update operation
INSERT INTO _people_history (_rowid, id, name, age, weight, _version, _updated, _mask)
SELECT old.rowid,
CASE WHEN old.id != new.id then new.id else null end,
CASE WHEN old.name != new.name then new.name else null end,
CASE WHEN old.age != new.age then new.age else null end,
CASE WHEN old.weight != new.weight then new.weight else null end,
(SELECT MAX(_version) FROM _people_history WHERE _rowid = old.rowid) + 1,
cast((julianday('now') - 2440587.5) * 86400 * 1000 as integer),
(CASE WHEN old.id != new.id then 1 else 0 end) + (CASE WHEN old.name != new.name then 2 else 0 end) + (CASE WHEN old.age != new.age then 4 else 0 end) + (CASE WHEN old.weight != new.weight then 8 else 0 end)
WHERE old.id != new.id or old.name != new.name or old.age != new.age or old.weight != new.weight;
END;
This does make backup sync to another database slightly harder - would need to also consider rows in the original table that don't yet have rows in the history table.
Maybe people who want to do sync avoid this option and go with full row copies on insert instead.
Also you lose the record showing exactly when each row was added.
Another option: store nulls in that first row if it's been added and not yet edited, with a _mask
of 0 and _version
of 0 to indicate no edits yet. Insert two rows on first edit to capture original values and preserve history as being append only.
That could be:
CREATE TRIGGER people_update_history
AFTER UPDATE ON people
FOR EACH ROW
BEGIN
-- Check if there is only one matching row with version 0
IF (SELECT COUNT(*) FROM _people_history WHERE _rowid = old.rowid) = 1 AND (SELECT _version FROM _people_history WHERE _rowid = old.rowid) = 0 THEN
-- Insert a full row duplicating old with version 1
INSERT INTO _people_history (_rowid, id, name, age, weight, _version, _updated, _mask)
VALUES (old.rowid, old.id, old.name, old.age, old.weight, 1, cast((julianday('now') - 2440587.5) * 86400 * 1000 as integer), 15);
END IF;
-- Insert another row showing differences between new and old
INSERT INTO _people_history (_rowid, id, name, age, weight, _version, _updated, _mask)
SELECT old.rowid,
CASE WHEN old.id != new.id then new.id else null end,
CASE WHEN old.name != new.name then new.name else null end,
CASE WHEN old.age != new.age then new.age else null end,
CASE WHEN old.weight != new.weight then new.weight else null end,
(SELECT MAX(_version) FROM _people_history WHERE _rowid = old.rowid) + 1,
cast((julianday('now') - 2440587.5) * 86400 * 1000 as integer),
(CASE WHEN old.id != new.id then 1 else 0 end) + (CASE WHEN old.name != new.name then 2 else 0 end) + (CASE WHEN old.age != new.age then 4 else 0 end) + (CASE WHEN old.weight != new.weight then 8 else 0 end)
WHERE old.id != new.id or old.name != new.name or old.age != new.age or old.weight != new.weight;
END;
That version 1 row should duplicate the _updated
date from the version 0 row though.
Plus the updated insert trigger that sets all columns to null and sets _version
and _mask
to 0.
Maybe the index should be on (_rowid, _version)
since the triggers run several queries that would benefit from that.
Is EXPLAIN
smart enough to take triggers into account?
An alternative implementation where created date is important, is to either add a “Last updated date” to the main table, potentially filled by a trigger, and when you get the first UPDATE
, that would be dragged into the history table.
Or to directly add the “Created” date to the main table, along with last updated.
I’m not certain either of these are helpful to your use case, but most of the systems I’ve worked with that have traceable changes end up with a created and last modified date and user in the main table, and then a history table where such a level of detail is needed.
I’m a little unsure, but I think you’re not handling the idea of a NULLABLE column correctly if you overload the semantics of NULL
Using your example people table, consider the following (I’m on an iPad, so this is untested code):
INSERT INTO people (name, age, weight)
VALUES (‘Robert Bruce Banner’, 35, NULL); -- don’t know his weight at first
-- version 0 is weight = NULL
UPDATE people
SET weight = 58.05 -- finally got him weighed
WHERE name = ‘Robert Bruce Banner’;
-- version 1 is weight = NULL
UPDATE people
SET weight = NULL -- he blew up into a hulk, need to find bigger scales
WHERE name = ‘Robert Bruce Banner’;
-- version 2 is weight = 58.05
UPDATE people
SET weight = 635.02 -- found a weighbridge
WHERE name = ‘Robert Bruce Banner’;
-- version 3 is weight = NULL
So, the question is, how do you ensure when retrieving version 3, you get the value NULL
and not 635.02
? And version 0? Is your _mask
value enough to disambiguate in all cases?
The system currently populates the history table with a duplicate record when a row is first inserted, doubling disk space used.
instead, try only inserting that first track changes row the first time a row is updated. This should save a lot of space.
UPDATE: Or have that first row use
null
for all of the values, at least until the row is first edited.