rogerbinns / apsw

Another Python SQLite wrapper
https://rogerbinns.github.io/apsw/
Other
739 stars 97 forks source link

changes() from an update statement not what I am expecting #536

Closed scarlock closed 2 months ago

scarlock commented 2 months ago

In the code included below, the value returned from the conn.changes() call after the update statement does not match the number of items returned from the returning clause of the update statement. (I hope that makes sense). If I copy and paste the SQL into either the sqlite3 shell or the apsw shell and turn .changes on the value is correct.

Thanks

#!/usr/bin/env python
# -*- mode: python; coding: utf-8 -*-

import apsw

print("      Using APSW file", apsw.__file__)
print("         APSW version", apsw.apsw_version())
print("SQLite header version", apsw.SQLITE_VERSION_NUMBER)
print("   SQLite lib version", apsw.sqlite_lib_version())
print("   Using amalgamation", apsw.using_amalgamation)

print("Connecting to in memory database...")
conn = apsw.Connection(":memory:")
print("Connected.")

print("Setting configs")
conn.config(apsw.SQLITE_DBCONFIG_DEFENSIVE, 1)
conn.config(apsw.SQLITE_DBCONFIG_DQS_DDL, 0)
conn.config(apsw.SQLITE_DBCONFIG_DQS_DML, 0)

print("Creating table...")
sql = """\
CREATE TABLE entry (
    rowid INTEGER PRIMARY KEY NOT NULL,
    type INTEGER NOT NULL
        CHECK (type in (0, 1, 2)),
    permissions INTEGER NOT NULL DEFAULT (0)
        CHECK (permissions >= 0 AND permissions <= 511),
    name TEXT NOT NULL
) STRICT;
"""
conn.execute(sql)
print("Created.")

print("Loading initial data...")
sql = """\
INSERT INTO entry   (rowid, type, name) VALUES
(    1,    0, 'root'), (  128,    1, 'f128'),
(  129,    0, 'd129'), (  130,    0, 'd130'),
(  131,    0, 'd131'), (  132,    0, 'd132'),
(  133,    0, 'd133'), (  134,    0, 'd134'),
(  135,    0, 'd135'), (  215,    0, 'd215');
"""
conn.execute(sql)
changes = conn.changes()
print(f"Loaded.  {changes=}")

print("Creating temp table...")
sql = """\
CREATE TEMP TABLE tentry (
    rowid INTEGER UNIQUE DEFAULT(NULL),
    type INTEGER NOT NULL
        CHECK (type IN (0, 1, 2)),
    permissions INTEGER NOT NULL
        CHECK (permissions >= 0 AND permissions <= 511),
    name TEXT NOT NULL
);
"""
conn.execute(sql)
print("Created.")

print("Loading initial data into temp table...")
sql = """\
INSERT INTO tentry  (rowid, permissions, type, name) VALUES
(    1,           0,    0, 'root'), (  128,         436,    1, 'f128'),
(  129,         509,    0, 'd129'), (  130,         509,    0, 'd130'),
(  131,         509,    0, 'd131'), (  132,         509,    0, 'd132'),
(  133,         509,    0, 'd133'), (  134,         509,    0, 'd134'),
(  135,         509,    0, 'd135'), (  215,         509,    0, 'd215');
"""
conn.execute(sql)
changes = conn.changes()
print(f"Loaded.  {changes=}")

print("Updating base table...")
sql = """\
UPDATE entry AS e
    SET permissions = t.permissions
    FROM tentry AS t
    WHERE
        t.rowid == e.rowid
        AND t.type == e.type
        AND e.permissions != t.permissions
    RETURNING rowid;
"""
rowids = conn.execute(sql)
changes = conn.changes()
rowids = list(rowids)
print(f"Updated.  {changes=}  {len(rowids)=}")
print(rowids)

Output:

      Using APSW file ~/opt/miniconda3/envs/test/lib/python3.12/site-packages/apsw/__init__.cpython-312-darwin.so
         APSW version 3.46.1.0
SQLite header version 3046001
   SQLite lib version 3.46.1
   Using amalgamation True
Connecting to in memory database...
Connected.
Setting configs
Creating table...
Created.
Loading initial data...
Loaded.  changes=10
Creating temp table...
Created.
Loading initial data into temp table...
Loaded.  changes=10
Updating base table...
Updated.  changes=10  len(rowids)=9
[(128,), (129,), (130,), (131,), (132,), (133,), (134,), (135,), (215,)]
rogerbinns commented 2 months ago

It is subtle - the problem is these 3 lines:

rowids = conn.execute(sql)
changes = conn.changes()
rowids = list(rowids)

Unlike networked databases, SQLite runs locally and tries to do the least amount of work possible to provide the first result row. Once that has been read it then tries to do the least amount of work to provide the next result row etc.

rowids = conn.execute(sql)

That doesn't provide the rowids - it provides an iterator of the result rows. In particular it means the SQLite program has only run until the first row returned.

changes = conn.changes()

That reads whatever the changes value is currently on the connection. In this case it is reading the value from the prior INSERT because the UPDATE has not completed.

rowids = list(rowids)

That now executes the SQLite program to completion, getting all the result rows. It also updates the changes count.

If you swapped the last two lines reading the changes after getting the rowids you would get the expected result.

I recommend you use get or fetchall when you want all results immediately and are not iterating over them. This would do what you want.

rowids = conn.execute(sql).fetchall()
changes = conn.changes()
scarlock commented 2 months ago

Thanks. It's obvious now.

One interesting interaction I've run into with .get - I was using @functools.lru_cache like this:

@functools.lru_cache
def get_type_from_id(db, rowid):
    """Function (with cache) to return the type for a given id if it exists."""
    sql = """\
SELECT type
FROM TEMP.types
WHERE rowid = ?;
"""
    result = db.execute(sql, (rowid,))
    return result.get

and was running into issues with the database not closing properly (not deleting the WAL files). Turns out the problem was the cache was keeping references active. Once I explicitly cleared the LRU cache(s) the database closed properly.

I don't think there is anything you can reasonably do about that - maybe a mention in the docs could help someone else out.

rogerbinns commented 2 months ago

There isn't really anything to document there as references in Python are strong by default. lru_cache is not documented to use weak references - it says:

The cache keeps references to the arguments and return values until they age out of the cache or until the cache is cleared.

Others have experienced the same issue and wanted it addressed.

Note that you can explicitly close a database. It is considered general best practise to do closing yourself rather than relying on garbage collection. For example if you use a debug Python, it complains about every file that isn't explicitly closed.

That said I really wouldn't care about closing, and certainly wouldn't care about WAL files being present. Database semantics remain identical and code couldn't tell. If you need to get a clean database file (eg to export it) then you can use VACUUM INTO.

Closing the issue since it is addressed.