grantjenks / python-diskcache

Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.
http://www.grantjenks.com/docs/diskcache/
Other
2.4k stars 134 forks source link

Cache access fails after forking if multiple `Cache` instances are backed by the same database #325

Open randomir opened 4 months ago

randomir commented 4 months ago

Running:

import os
import diskcache

a = diskcache.Cache(directory='/tmp/cache')
b = diskcache.Cache(directory='/tmp/cache')

os.fork()

a.get('key')

on a MacOS machine, fails with:

Traceback (most recent call last):
  File "/Users/distiller/project/fork.py", line 9, in <module>
    a.get('key')
  File "/Users/distiller/project/env/lib/python3.12/site-packages/diskcache/core.py", line 1165, in get
    rows = self._sql(select, (db_key, raw, time.time())).fetchall()
           ^^^^^^^^^
  File "/Users/distiller/project/env/lib/python3.12/site-packages/diskcache/core.py", line 648, in _sql
    return self._con.execute
           ^^^^^^^^^
  File "/Users/distiller/project/env/lib/python3.12/site-packages/diskcache/core.py", line 623, in _con
    con = self._local.con = sqlite3.connect(
                            ^^^^^^^^^^^^^^^^
sqlite3.OperationalError: disk I/O error

(tested on CircleCI M1 medium instance)

AFAICT, all of the following conditions have to be met:

If any of the above is removed, the snippet works are expected.

SQLite threading mode (sqlite3.threadsafety) is set to multi-thread ("Threads may share the module, but not connections"), so I don't think that's causing this because diskcache reconnects on forking already.

$ python
Python 3.12.4 (main, Jul 18 2024, 14:14:06) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> sqlite3.threadsafety
1

Possibly related to https://github.com/grantjenks/python-diskcache/issues/266.

ddorian commented 3 months ago

I tested your code on Ubuntu 22.04 Python 3.12 x86 and it worked fine. This is (maybe) related to how fork works underneath in Python, though I used the same one:

import multiprocessing

multiprocessing.set_start_method("fork", force=True)

print(multiprocessing.get_start_method())
import os

import diskcache

a = diskcache.Cache(directory="/tmp/cache")
b = diskcache.Cache(directory="/tmp/cache")

os.fork()

a.get("key")
randomir commented 3 months ago

@ddorian, exactly, this works perfectly on Linux (as everything does, right?). Maybe I wasn't clear enough above, but MacOS is a necessary condition for reproduction.