Open mattip opened 4 days ago
This is because the _cleanup
gets run during the call to checkcache. You can probably get this to fail on CPython by introducing a reference cycle:
func = gen_func(n)
func.func = func
del func
I think this is a bug in linecache, because it is not threadsafe (or GC callback safe) and probably should be
I agree that changing the code in linecache is the easiest thing. Can PyPy just do that or does it need to be fixed upstream?
I can't really see how else what we can do about it in SymPy. We want to clear the linecache when the generated functions get deleted to avoid a memory leak, but if PyPy's gc runs at random times it's hard to prevent problems. The best I could come up with was to clear the entry instead of deleting it, but that still gives a memory leak (although a less serious one) https://github.com/sympy/sympy/pull/27232. Or we could monkeypatch the standard library, but I'm not particularly keen on doing that.
You can probably get this to fail on CPython
Adding a reference cycle did not cause an error on CPython. We can change the stdlib that is shipped with PyPy, but we should first find a reproducer on CPython and file an upstream issue.
Note that the existing code in checkcache
has some defenses against this kind of thing happening already, by using cache.pop(filename, None)
. The default argument means that no KeyError
is raised if the key doesn't exist. So I'd suggest we just do this sort of fix:
diff --git a/lib-python/3/linecache.py b/lib-python/3/linecache.py
index b854425831d..6ab914e6f01 100644
--- a/lib-python/3/linecache.py
+++ b/lib-python/3/linecache.py
@@ -61,9 +61,10 @@ def checkcache(filename=None):
return
for filename in filenames:
- entry = cache[filename]
- if len(entry) == 1:
- # lazy cache entry, leave it lazy.
+ # PyPy modification: be careful about disappearing cache keys
+ entry = cache.get(filename)
+ if entry is None or len(entry) == 1:
+ # deleted or lazy cache entry, leave it alone.
continue
size, mtime, lines, fullname = entry
if mtime is None:
I agree that it's clearly a problem for cpython too. Not sure it's doable to get a crash via the GC, but with multi-threading it's definitely broken. Somebody should report that to CPython, but I don't think we want to wait for an official fix (which is going to be potentially much more heavyweight)
Probably also wants to produce the keys with filenames = cache.copy().keys()
rather than list(cache.keys())
you mean because python code is being run between the return of the keys
call and the call to list
, and collect could happen there?
That's very subtle if that's the case. I don't think the Python world is ready for these sorts of thread safety issues in a free-threaded world.
One patch the let's the reproducer in this issue keep going with latest pypy on my machine is to patch checkcache
in linecache.py
to read:
def checkcache(filename=None):
gc.disable()
try:
... # original function body
finally:
gc.enable()
I tried to use threading.Lock
instead, but I don't know if PyPy's gc runs in a thread which respects it? (either way I couldn't get it to work, deadlocking with Lock
and no change with RLock
).
In the patch you just need to catch the KeyError and continue, no need to disable the GC. It's not just susceptible to GC issues it's also susceptible to threading issues
Fixed in 60bec6ab74b by applying the patch, adding the test, and tweaking the iterator
I actually got around to running the demo on cpython 3.12.3 and 3.13.0 and it crashes with a KeyError pretty quickly every time I've run it:
✘ graingert@conscientious ~/projects/weakref-func-cycle-never-gc main python demo.py
Traceback (most recent call last):
File "/home/graingert/projects/weakref-func-cycle-never-gc/demo.py", line 38, in <module>
main()
File "/home/graingert/projects/weakref-func-cycle-never-gc/demo.py", line 32, in main
linecache.checkcache()
File "/usr/lib/python3.12/linecache.py", line 64, in checkcache
entry = cache[filename]
~~~~~^^^^^^^^^^
KeyError: '<generated-147>'
✘ graingert@conscientious ~/projects/weakref-func-cycle-never-gc main phyt
✘ graingert@conscientious ~/projects/weakref-func-cycle-never-gc main python3.13 demo.py
Traceback (most recent call last):
File "/home/graingert/projects/weakref-func-cycle-never-gc/demo.py", line 38, in <module>
main()
~~~~^^
File "/home/graingert/projects/weakref-func-cycle-never-gc/demo.py", line 32, in main
linecache.checkcache()
~~~~~~~~~~~~~~~~~~~~^^
File "/usr/lib/python3.13/linecache.py", line 59, in checkcache
entry = cache[filename]
~~~~~^^^^^^^^^^
KeyError: '<generated-2637>'
which is supposed to be impossible! The function should be deleted immediately by the reference counter
Over at SymPy, they found this code that raises on PyPy but works fine on CPython:
On PyPy I get
Not sure if this is a bug or expected, they are wondering how to correctly write this code. cc @asmeurer