Open samuelcolvin opened 2 years ago
From what I can gather, there is no actual leak strictly speaking, but it seems like it might be a corner case where that garbage collector isn't running as often as it maybe should be for this scenario.
Doing something like below shows some strange artifacts where the "before collection" case has widely variable numbers of weakref.weakref
objects lingering around, and widely variable performance. However, calling gc.collect()
consistently takes the number back down. Does manually adding collect()
calls fix your use-case?
from abc import ABCMeta
from gc import collect, get_objects
from collections import Counter
from pprint import pprint
class MyMetaclass(ABCMeta):
pass
class MyClass(metaclass=MyMetaclass):
pass
def main():
class Sub(MyClass):
pass
assert issubclass(Sub, MyClass)
class Other:
pass
assert not issubclass(Other, MyClass)
def print_object_types(n=5):
types = Counter(map(type, get_objects())).most_common(n)
pprint(types, underscore_numbers=True)
N = 1000
for _ in range(1000):
N += N//10
for j in range(N):
main()
print(N)
print("before collection:")
print_object_types()
while collect():
pass
print("after collection:")
print_object_types()
print("\n\n")
@pablogsal any ideas?
I will investigate later but you can also try to adjust the gc threshold down to force collections more often and another thing is to set the GC flags to debug flags to gc.DEBUG_SAVEALL so the cycles are being saved so we can inspect what objects are responsible or the memory and why they are forming cycles.
Rather than the GC, and since the problem seems to be more severe when the checks return false, I thought this might be _abc_negative_cache
growing indefinitely?
Rather than the GC, and since the problem seems to be more severe when the checks return false, I thought this might be
_abc_negative_cache
growing indefinitely?
Is possible, but we need some analysis and unfortunately I am writing this from my phone while hiking so I cannot test until I am back at home :sweat_smile:
I would be surprised if this is a GC problem to be honest because that would involve some circular references and unless I am missing something there should not be growing isolated cycles here.
My educated guess is that is likely the combination of this:
https://github.com/python/cpython/blob/36d42e701f8a4040e7daf70138217038bfde3c78/Modules/_abc.c#L201
https://github.com/python/cpython/blob/36d42e701f8a4040e7daf70138217038bfde3c78/Modules/_abc.c#L195
https://github.com/python/cpython/blob/36d42e701f8a4040e7daf70138217038bfde3c78/Modules/_abc.c#L190
I notice that the pure-python version uses a WeakSet
where the c version uses a set of weakrefs.
https://github.com/python/cpython/blob/953ab0795243900ccccaaca069d932730a86fc20/Lib/_py_abc.py#L50
I asked a colleague to run the memray
flamegraphs with native introspection while I return home. It shows clearly that my guess was right and is the ABC cache the one that is responsible of this problem.
Note: This uses the latest development branch of memray
so you may see something slightly different.
This shows the memory that is there at the end of execution
)
This shows where the chunks of memory that form the heap when the memory is at its peak where allocated
You can see in the last plot how the cache is periodically cleaned up (maybe because GC passes?), but there is something that survives.
So I messed around with python version of this a bit. By just removing the negative cache from ABCMeta it become around 30x faster and no longer had a memory leak taking 2 second to do 10000 iterations. Even manually calling gc.collect seems to be unable to free up all the memory.
It grows from 11mb at startup to 68 MB on end.
I went further and swapped to a regular set()
this had a memory leak but it was actually about half the size of the original memory leak and only went up to 33mb.
VERY VERY VERY strangely adding that to __subclasscheck__
right before setting the cache actually speeds up the program about 3 times.
if cls.__name__=="Foobar":
pass
else:
print(f"{cls.__name__} has {len(cls.__subclasses__())} subclasses")
My best guess is the issue is just one of pure volume.
Because you are recursively running instance check and recursively caching the values its just creating a ton of items in the WeakRefs. Bumping the original code up to two child classes from MyClass grinds to an absolute halt. The real issue is if there is ever a parent class that has many children (like every single item in pydantic) you are going to perform subclass checking on 100+ different child classes and are going to be caching values in all 100+ different child classes. Because gc isn't able to catch up during this Myclass ends up with a few thousand subclasses still attached to it each of which is recursively gone through. Manually calling gc cleared the weakreferences from the __subclasses__
method but the leaked memory does not recover
Further it is actually bugged and doesn't work properly which is actually what is stopping this from melting down all the time. If a class was already called with isinstance
when its called a second time it will never run through all the subclasses as the cache will not break and it will be short circuited at the top. Frankly that bug probably prevented a lot of major issues.
The cache is never invalidated for subclassing just for registrations.
@samuelcolvin For pydantic I would just replace the __subclasscheck__
on the metaclass for the main pydanitc metaclass. This should clear up the bulk of the memory leak as well as likely pretty dramatically increase the actual speed since I doubt it needs to check every single child class on a miss. I would presume it in fact should never need to because all child classes directly descended from the BaseClass.
The ratio of cache hit/miss looks super low for this scenario, can we switch caches into LRUCache?
I honestly dont think it will fix the core issue. The main problem is just the way it walks down the class tree. This makes any sufficiently large class hierarchy impossibly slow and causes massive memory consumption. Switching to an LRU cache only solves the memory part but the above example of 10k comparisons still takes 2 mins to run on my macbook because it causes 10k lookups per issubclass call.
If you are generating classes the bulk of which never get compared in any natural way them getting involved in their parents issubclass call is unintuitive at best. Frankly I am not sure I fully understand the intent of it. I presume its for registered calls but if it is not and the intent is to allow subclasses to specialize how they check then it should just be considered bugged.
Bug report
I'm been hunting a memory leak in pydantic and I keep seeing
_abc_subclasscheck
popping up in the memray flamegraph.Locally I've seen memray report that
_abc_subclasscheck
is using 2.8GB of memory in some cases!I can't get anything that bad in a minimal example, but I have the following:
A few things to note:
issubclass
calls are necessary, certainly I don't see a rise in memory with just the last oneissubclass
on an abc is very poor indeed - this script takes 14seconds, if i switchABCMeta
totype
it takes 121ms!Your environment