python / cpython

The Python programming language
https://www.python.org
Other
62.6k stars 30.04k forks source link

SQLite3 multithreading cache inconsistency #118172

Open adam444555 opened 5 months ago

adam444555 commented 5 months ago

Bug report

Bug description:

When using SQLite3 in multi-threading application, the fetch reuslts are not consistent. After some testing, this seems to be caused by the caching. The SQLite threading mode is serialized. This issue only exists in python 3.12 and 3.13. No issue in 3.11. Currently solution is to set cached_statements=0 when connect to the database.

A simple demo to reproduce the error:

import sqlite3
import threading

KB = sqlite3.connect(
    "example.db", check_same_thread=False
)

def execute_query():
    cursor = KB.cursor()
    cursor.execute("SELECT * FROM test_table")
    result = cursor.fetchall()
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
    return result

def run_threads():
    threads = []
    for i in range(50):
        thread = threading.Thread(target=execute_query)
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

def test_multithreading():
    cursor = KB.cursor()

    cursor.execute(
        """CREATE TABLE IF NOT EXISTS test_table (id INTEGER PRIMARY KEY, value TEXT)"""
    )
    KB.commit()

    cursor.execute("""DELETE FROM test_table""")
    KB.commit()

    cursor.execute("""INSERT INTO test_table (value) VALUES ('test1')""")
    cursor.execute("""INSERT INTO test_table (value) VALUES ('test2')""")
    cursor.execute("""INSERT INTO test_table (value) VALUES ('test3')""")
    KB.commit()

    run_threads()

    KB.close()

if __name__ == "__main__":
    test_multithreading()

Test output: (Different in each run because of inconsistent fetch results)

Exception in thread Thread-3 (execute_query):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
Exception in thread Thread-7 (execute_query):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self._target(*self._args, **self._kwargs)
  File "/home/user/tmp/kb_test.py", line 13, in execute_query
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
Exception in thread Thread-12 (execute_query):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
    self._target(*self._args, **self._kwargs)
AssertionError: [(1, 'test2'), (3, 'test3')]
  File "/home/user/tmp/kb_test.py", line 13, in execute_query
Exception in thread Thread-1 (execute_query):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
    self.run()
AssertionError: [(3, 'test3')]
  File "/usr/lib/python3.12/threading.py", line 1010, in run
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/tmp/kb_test.py", line 13, in execute_query
    self._target(*self._args, **self._kwargs)
  File "/home/user/tmp/kb_test.py", line 13, in execute_query
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
AssertionError: [(None, None), (), (2, 'test2'), (), (), (2, 'test2'), (0, None), (1, 'test1'), (2, 'test2'), (3, 'test3')]
AssertionError: [(3, 'test3')]

CPython versions tested on:

3.12, 3.13

Operating systems tested on:

Linux

corona10 commented 5 months ago

cc @erlend-aasland

erlend-aasland commented 5 months ago

cc @erlend-aasland

Thanks Donghee! (BTW, I'm already subscribed to the topic-sqlite3 label, so no need to ping me manually 🙂)

9001 commented 5 months ago

is this correct use of the api though, setting check_same_thread=False and then accessing the same connection from multiple threads simultaneously? The issue goes away with the following change:

LK = threading.Lock()
def execute_query():
    cursor = KB.cursor()
    with LK:
        cursor.execute("SELECT * FROM test_table")
        result = cursor.fetchall()
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
    return result
adam444555 commented 5 months ago

is this correct use of the api though, setting check_same_thread=False and then accessing the same connection from multiple threads simultaneously? The issue goes away with the following change:

LK = threading.Lock()
def execute_query():
    cursor = KB.cursor()
    with LK:
        cursor.execute("SELECT * FROM test_table")
        result = cursor.fetchall()
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
    return result

I am not sure. But based on the description from the document, for serialized threading mode, "Threads may share the module, connections and cursors", and no lock needed for the old version (3.11 and older), I assume lock is not necessary.

erlend-aasland commented 5 months ago

I can confirm that the issue appears in Python 3.12 and Python 3.13, but not in earlier versions (I tested back to Python 3.8).

I also tried to revert the sqlite3 extension code back to its 3.11 state (making only a few needed C API changes), just to make sure the issue exists in the sqlite3 extension and not in the runtime. I can confirm the issue is in the sqlite3 extension. I'll do a bisect as soon as possible.

erlend-aasland commented 5 months ago

I bisected this back to f5c85aa3eea1adf0c61089583e2251282a316ec1, which seems plausible. I'll also try to come up with a smaller repro so we can add a regression test for this.

erlend-aasland commented 5 months ago

However, I'm not completely convinced this is the actual offending commit; If I check out f5c85aa3eea1adf0c61089583e2251282a316ec1~, I get intermittent segfaults[^1] for about 5% of the runs.

[^1]: bad memory access in _pysqlite_query_execute