Closed simonw closed 4 years ago
This relates to this bug: https://github.com/dogsheep/github-to-sqlite/pull/8#issuecomment-549233778
Fixing this is going to be a real pain. There's lots of code out there that uses sqlite-utils
with the expectation that upsert()
behaves as it currently does.
Maybe I need to introduce new terms for both of these different patterns and deprecate the existing .upsert()
and .upsert_all()
since their behaviour can't be changed?
Or maybe I fix this and ship sqlite-utils 2.0
with a breaking change?
If I do implement the correct definition of .upsert()
I think I'll use this pattern, since it works in versions of SQLite prior to 3.24:
INSERT OR IGNORE INTO book(id) VALUES(1001);
UPDATE book SET name = 'Programming' WHERE id = 1001;
This warrants making a backwards compatible change, which means I'll need to bump the major version number and release 2.0.
I'm going to rename the existing upsert()
and upsert_all()
methods to replace()
and replace_all()
- then write new upsert()
and upsert_all()
methods that implement the correct behavior.
Is replace()
a good name here? It doesn't really convey the idea that a brand new record will be created if there isn't an existing one to replace.
Maybe inplace()
(combining "insert" and "replace")?
It could be an alias for .insert(..., replace=True)
This is going to affect the design of the CLI subcommands as well.
Maybe instead of inventing a new term I should tell people to use .insert(..., replace=True)
directly. That matches ignore=True
.
First step: add a replace=True
argument to insert()
and insert_all()
that does the same thing as the current upsert=True
Urgh this is going to be quite a bit of work, especially in the CLI module which shares an implementation for upsert
and insert
in a way that looks like it will have to be unwrapped.
Thinking about this further: I believe every time I've personally used upsert
in the past (either with the Python library or the CLI tool) I've actually wanted the new behaviour, where "upsert" means "update existing record with these changes, or insert a new record if one does not exist".
So I'm happy with upsert
doing that, and insert --replace
being added as an option that does what upsert
does ta the moment.
I'll still ship it as version 2.0 since it's technically a breaking change.
Don't forget to update the documentation. This will be quite an involved task.
I'm going to start by ignoring the existing upsert
entirely and implementing .insert(..., replace=True)
and $ sqlite-utils insert --replace
. Including updating the tests.
Then I'll figure out how to implement the new .upsert()
/ $ sqlite-utils upsert
.
Then I'll update the documentation, and ship sqlite-utils
2.0.
Last step: update changelog and ship 2.0. Then I can close this issue.
I shipped 2.0 - release notes here: https://sqlite-utils.readthedocs.io/en/stable/changelog.html#v2
I also wrote about it on my blog: https://simonwillison.net/2019/Dec/30/sqlite-utils-2/
This thread here is illuminating: https://stackoverflow.com/questions/3634984/insert-if-not-exists-else-update
The term
UPSERT
in SQLite has a specific meaning as-of 3.24.0 (2018-06-04): https://www.sqlite.org/lang_UPSERT.htmlIt means "behave as an UPDATE or a no-op if the INSERT would violate a uniqueness constraint". The syntax in 3.24.0+ looks like this (confusingly it does not use the term "upsert"):
Here's the problem: the
sqlite-utils
.upsert()
and.upsert_all()
methods don't do this. They use the following SQL:If the record already exists, it will be entirely replaced by a new record - as opposed to updating any specified fields but leaving existing fields as they are (the behaviour of "upsert" in SQLite itself).