simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.62k stars 109 forks source link

`table.upsert_all` fails to write rows when `not_null` is present #538

Closed xavdid closed 1 year ago

xavdid commented 1 year ago

I found an odd bug today, where calls to table.upsert_all don't write rows if you include the not_null kwarg.

Repro Example

from sqlite_utils import Database

db = Database("upsert-test.db")

db["comments"].upsert_all(
    [{"id": 1, "name": "david"}],
    pk="id",
    not_null=["name"],
)

assert list(db["comments"].rows) # err!

The schema is correctly created:

CREATE TABLE [comments] (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT NOT NULL
)

But no rows are created. Removing either the not_null kwargs works as expected, as does an insert_all call.

Version Info

simonw commented 1 year ago

Confirmed - I added this test and it fails:

def test_upsert_all_not_null(fresh_db):
    # https://github.com/simonw/sqlite-utils/issues/538
    fresh_db["comments"].upsert_all(
        [{"id": 1, "name": "Cleo"}],
        pk="id",
        not_null=["name"],
    )
    assert list(fresh_db["comments"].rows) == [{"id": 1, "name": "Cleo"}]
simonw commented 1 year ago

From time in the debugger, after creating the table it ends up doing this:

(Pdb) queries_and_params
[
  ('INSERT OR IGNORE INTO [comments]([id]) VALUES(?);', [1]),
  ('UPDATE [comments] SET [name] = ? WHERE [id] = ?', ['Cleo', 1])
]
simonw commented 1 year ago

Here's the problem:

import sqlite3
db = sqlite3.connect(":memory:")
db.execute('create table foo (id integer primary key, name not null)')
db.execute('insert into foo (id) values (1)')

Produces:

IntegrityError: NOT NULL constraint failed: foo.name

But this:

db.execute('insert or ignore into foo (id) values (1)')

Completes without an exception.

simonw commented 1 year ago

Here's the code at fault: https://github.com/simonw/sqlite-utils/blob/80763edaa2bdaf1113717378b8d62075c4dcbcfb/sqlite_utils/db.py#L2774-L2788

simonw commented 1 year ago

This feels like a fundamental flaw in the way upserts are implemented by sqlite-utils.

One fix would be to switch to using the UPSERT feature in SQLite: https://www.sqlite.org/lang_UPSERT.html

But...

UPSERT syntax was added to SQLite with version 3.24.0 (2018-06-04).

I still want to support SQLite versions earlier than that.

simonw commented 1 year ago

I could detect if this happens using cursor.rowcount - not sure how I would recover from it though.

This would also require some major re-engineering, since currently it all works by generating a list of SQL queries in advance and applying them inside a loop in .insert_chunk():

https://github.com/simonw/sqlite-utils/blob/80763edaa2bdaf1113717378b8d62075c4dcbcfb/sqlite_utils/db.py#L2839-L2878

simonw commented 1 year ago

How about if I had logic which checked that all not-null columns were provided in the call to upsert_all() - and if they were, modified the INSERT OR IGNORE INTO to include a placeholder value for those columns that would then be fixed by the later UPDATE?

Something like this:

[
  ('INSERT OR IGNORE INTO [comments]([id], name) VALUES(?, ?);', [1, '']),
  ('UPDATE [comments] SET [name] = ? WHERE [id] = ?', ['Cleo', 1])
]
simonw commented 1 year ago

That fix seems to work!

xavdid commented 1 year ago

perfect, thank you!

hbmartin commented 8 months ago

Hello, I am experiencing a similar issue:

Python version: 3.9.6 sqlite version: (3, 39, 5) sqlite-utils version: 3.36

I create a table as:

        db["feeds"].create(
            {"overcastId": int, "title": str, "text": str, "subscribed": bool, "overcastAddedDate": datetime.datetime, "notifications": bool, "xmlUrl": str, "htmlUrl": str},
            pk="overcastId",
            not_null={"overcastId", "title", "xmlUrl"},
        )

Resulting in the schema:

CREATE TABLE [feeds] (
   [overcastId] INTEGER PRIMARY KEY NOT NULL,
   [title] TEXT NOT NULL,
   [text] TEXT,
   [subscribed] INTEGER,
   [overcastAddedDate] TEXT,
   [notifications] INTEGER,
   [xmlUrl] TEXT NOT NULL,
   [htmlUrl] TEXT
);

When I try to upsert a dict that looks like{'overcastId': 56, 'text': 'HBR IdeaCast', 'title': 'HBR IdeaCast', 'xmlUrl': 'http://feeds.harvardbusiness.org/harvardbusiness/ideacast', 'htmlUrl': 'https://hbr.org/podcasts/ideacast', 'overcastAddedDate': '2018-02-15T14:19:58+00:00', 'subscribed': True, 'notifications': False} then I get an error sqlite3.IntegrityError: datatype mismatch when I include in the upsert call not_null={"overcastId", "title", "xmlUrl"}

However, if I try upserting without not_null then I get no error BUT also no rows are inserted 🤔