simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases
https://sqlite-utils.datasette.io
Apache License 2.0
1.62k stars 109 forks source link

`IndexError` when doing `.insert(..., pk='id')` after `insert_all` #554

Open xavdid opened 1 year ago

xavdid commented 1 year ago

I believe this is related to https://github.com/simonw/sqlite-utils/issues/98.

When pk is specified by table A's insert call, it throws an index error if a different table has written a row with a higher rowid than exists in the first table. Here's a basic example:

from sqlite_utils import Database

def test_pk_for_insert(fresh_db):
    user = {"id": "abc", "name": "david"}

    fresh_db["users"].insert(user, pk="id")

    fresh_db["comments"].insert_all(
        [
            {"id": "def", "text": "ok"},
            {"id": "ghi", "text": "great"},
        ],
    )

    fresh_db["users"].insert(
        user,
        ignore=True,
        # BUG: when specifying pk on the second insert call 
        # db.py goes into a block it doesn't expect and we get the error
        pk="id",
    )

if __name__ == "__main__":
    db = Database("bug.db")
    if db["users"].exists():
        raise ValueError(
            "bug only shows on a new database - remove bug.db before running the script"
        )
    test_pk_for_insert(db)

The error is:

  File "/Users/david/projects/reddit-to-sqlite/.venv/lib/python3.11/site-packages/sqlite_utils/db.py", line 2960, in insert_chunk
    row = list(self.rows_where("rowid = ?", [self.last_rowid]))[0]
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

The issue is in this block:

https://github.com/simonw/sqlite-utils/blob/2747257a3334d55e890b40ec58fada57ae8cfbfd/sqlite_utils/db.py#L2954-L2958

relevant locals are:

What's most interesting is the comment # self.last_rowid will be 0 if a "INSERT OR IGNORE" happened, which doesn't seem to be the case here.

xavdid commented 1 year ago

Oh and for context - this goes away if I use .upsert instead of insert(..., ignore=True), but I don't want to update the value if it's written, just do an insert if it's new. The code is basically:

def save_items(table, items):
    db["users"].insert(build_user(items[0]), pk="id",ignore=True)
    db[table].insert_all(items)

if comments := fetch_comments():
    save_items('comments', comments)

if posts := fetch_posts():
    save_items('posts', posts)

So either comments or post could create the relevant user if those items exist. In cases where they both exist, I get this error. I need the pk because either call could create the table.