simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.68k stars 112 forks source link

The ".upsert()" method is misnamed #66

Closed simonw closed 4 years ago

simonw commented 5 years ago

This thread here is illuminating: https://stackoverflow.com/questions/3634984/insert-if-not-exists-else-update

The term UPSERT in SQLite has a specific meaning as-of 3.24.0 (2018-06-04): https://www.sqlite.org/lang_UPSERT.html

It means "behave as an UPDATE or a no-op if the INSERT would violate a uniqueness constraint". The syntax in 3.24.0+ looks like this (confusingly it does not use the term "upsert"):

INSERT INTO phonebook(name,phonenumber) VALUES('Alice','704-555-1212')
  ON CONFLICT(name) DO UPDATE SET phonenumber=excluded.phonenumber

Here's the problem: the sqlite-utils .upsert() and .upsert_all() methods don't do this. They use the following SQL:

INSERT OR REPLACE INTO [{table}] ({columns}) VALUES {rows};

If the record already exists, it will be entirely replaced by a new record - as opposed to updating any specified fields but leaving existing fields as they are (the behaviour of "upsert" in SQLite itself).

simonw commented 5 years ago

This relates to this bug: https://github.com/dogsheep/github-to-sqlite/pull/8#issuecomment-549233778

simonw commented 5 years ago

Fixing this is going to be a real pain. There's lots of code out there that uses sqlite-utils with the expectation that upsert() behaves as it currently does.

Maybe I need to introduce new terms for both of these different patterns and deprecate the existing .upsert() and .upsert_all() since their behaviour can't be changed?

Or maybe I fix this and ship sqlite-utils 2.0 with a breaking change?

simonw commented 5 years ago

If I do implement the correct definition of .upsert() I think I'll use this pattern, since it works in versions of SQLite prior to 3.24:

INSERT OR IGNORE INTO book(id) VALUES(1001);
UPDATE book SET name = 'Programming' WHERE id = 1001;
simonw commented 5 years ago

This warrants making a backwards compatible change, which means I'll need to bump the major version number and release 2.0.

I'm going to rename the existing upsert() and upsert_all() methods to replace() and replace_all() - then write new upsert() and upsert_all() methods that implement the correct behavior.

simonw commented 5 years ago

Is replace() a good name here? It doesn't really convey the idea that a brand new record will be created if there isn't an existing one to replace.

simonw commented 5 years ago

Maybe inplace() (combining "insert" and "replace")?

It could be an alias for .insert(..., replace=True)

simonw commented 5 years ago

This is going to affect the design of the CLI subcommands as well.

simonw commented 5 years ago

Maybe instead of inventing a new term I should tell people to use .insert(..., replace=True) directly. That matches ignore=True.

simonw commented 5 years ago

First step: add a replace=True argument to insert() and insert_all() that does the same thing as the current upsert=True

https://github.com/simonw/sqlite-utils/blob/8dab9fd1ccf571e188eec9ccf606a0c50fccf200/sqlite_utils/db.py#L938-L946

simonw commented 5 years ago

Urgh this is going to be quite a bit of work, especially in the CLI module which shares an implementation for upsert and insert in a way that looks like it will have to be unwrapped.

simonw commented 5 years ago

Thinking about this further: I believe every time I've personally used upsert in the past (either with the Python library or the CLI tool) I've actually wanted the new behaviour, where "upsert" means "update existing record with these changes, or insert a new record if one does not exist".

So I'm happy with upsert doing that, and insert --replace being added as an option that does what upsert does ta the moment.

I'll still ship it as version 2.0 since it's technically a breaking change.

simonw commented 4 years ago

Don't forget to update the documentation. This will be quite an involved task.

simonw commented 4 years ago

I'm going to start by ignoring the existing upsert entirely and implementing .insert(..., replace=True) and $ sqlite-utils insert --replace. Including updating the tests.

Then I'll figure out how to implement the new .upsert() / $ sqlite-utils upsert.

Then I'll update the documentation, and ship sqlite-utils 2.0.

simonw commented 4 years ago

Last step: update changelog and ship 2.0. Then I can close this issue.

simonw commented 4 years ago

I shipped 2.0 - release notes here: https://sqlite-utils.readthedocs.io/en/stable/changelog.html#v2

I also wrote about it on my blog: https://simonwillison.net/2019/Dec/30/sqlite-utils-2/