Performance hit when classifying many dicts (non-GC-tracked and with no owner)

svenil commented 5 years ago

This occured in commit https://github.com/zhuyifei1999/guppy3/commit/6373998001465cf21c21661cfa975f50b19b2c7b which was apparently after V3.0.6 was released.

In hv_cli_dictof_update_new_method in src/hv_cli_dictof.c, the call to hv_get_objects was replaced with a call to gc_get_objects. This was to be faster because of an easier way to find objects and they were also fewer. However, we didn't get non-gc-collected objects such as simple dicts with non-gc-collected members. This made the hv_cli_dictof_update method be called each time we tried to classify such dicts, from hv_cli_dictof_classify, because we didn't store any information about those dicts in the nodegraph self->owners. The classifier thought it needed more information each time.

Code classifying an idset() of a list of 5000 empty dicts went from about 0.3 seconds (with the change reverted) to 9 seconds in Python 3 with the modification mentioned.

# dictofno.py
from guppy import hpy
h=hpy()
d=[{} for i in range(5000)]
print (h.idset(d))

Checking in the debugger revealed that the update function was called many times.

It has been suggested (by the author of Guppy 3, YiFei Zhu) that the hv_cli_dictof_classify function should call the update method only once.

I presume this means only once during the lifetime of the classifier. I think this would work if the classifier is allocated together with the set it classifies, because those sets are immutable so the classification would not change. I think this is the current situation.

However, if we would turn to have memoized classifiers in the Heapy guppy.hpy() structure, (to speed things up) then there might be the case that new dicts are created that should require to update the dict ownership graph.

Then it would require many calls to update during the lifetime of the classifier. I don't know if this will be a reality but it may be something to consider.

svenil commented 5 years ago

Hmm, it occured to me and looking in the code (Classifiers.py, View.py) that the dict ownership graph is already memoized and passed to the classifier. So it would need to be updated some time. But maybe not in the lifetime of the classifier? Or maybe it would. Have to consider these things and try it out.

zhuyifei1999 commented 5 years ago

~~I mean, a dict cannot acquire a new owner, and an object cannot acquire a new dict, a right?~~ Oh wait, it can...

However, while hv_cli_dictof_classify is running, there should not be a way for a the dict to acquire a new owner, or objects tp acquire a new dict, right? So for the duration of hv_cli_dictof_classify, we could call hv_cli_dictof_update_new_method at most once. If it is not found after the run then so be it; it is not owned.

svenil commented 5 years ago

But what I can see, already for the duration of hv_cli_dictof_classify the hv_cli_dictof_update_new_method is called at most once. It is repeated calls to the mentioned classify method that may call the update method several times.

zhuyifei1999 commented 5 years ago

Hmm, you're right. I thought hv_cli_dictof_classify was calling hv_cli_dictof_update_new_method in a loop.

zhuyifei1999 commented 5 years ago

>>> import gc
>>> a = [{}]
>>> gc.get_referrers({})
[]
>>> gc.get_referrers(a[0])
[[{}]]

$ python -m timeit -s 'import gc; a = [{}]' 'gc.get_referrers(a[0])'
2000 loops, best of 5: 162 usec per loop

Wait, but this wouldn't work if that dict is in a non-garbage-collected container like another dict that doesn't contain any gc-tracked stuffs right?

>>> import gc
>>> a = [{1:{}}]
>>> gc.get_referrers(a[0][1])
[{1: {}}]
>>> gc.is_tracked(a[0][1])
False
>>> gc.is_tracked(a[0])
True

No. Dicts become GC tracked if it could potentially contain a GC tracked item, using the MAINTAIN_TRACKING macro. Similarly, anything sane would be GC tracked if it refers to a dict, because that dict might create a reference cycle by containing that 'anything'.

So to fix that, we could just iterate on the referees of every GC-tracked object, right?

svenil commented 5 years ago

Sounds like something we could try. I'm off to work now.

zhuyifei1999 commented 5 years ago

That said, there are plenty of insane things that are not GC-tracked and refers to a dict, such as the C extension importing mechanism and dicts of interpreter states, but they are not PyObjects (so a reference cycle can't be created) and can't be accessed normally anyways. Well, yes, we expose them by rootstate, but I don't imagine someone somehow creates 5000 such non-GC-tracked and not-directedly-referred-to-by-GC-object dicts and asking us to classify them.

Besides, it would be very unusual even for these extreme dicts to be not GC-tracked:

>>> for name in dir(hp.Root):
...     if isinstance(getattr(hp.Root, name), dict):
...         if not gc.is_tracked(getattr(hp.Root, name)):
...             print(name)
... 
>>>

And even if you iso some dict, there will be a ImmNodeSet declaring the existence of this dict that is being classified:

>>> a = {}
>>> b = hp.iso(a)
>>> [type(x) for x in gc.get_referrers(a)]
[<class 'guppy.sets.setsc.ImmNodeSet'>, <class 'dict'>]

You'd have to intentionally call hv_cli_dictof_classify with a fresh dict that isn't referred to by anything 5000 times, at which point, it's not traversable from root anyways so who cares. Traversing the heap isn't gonna save it.

Though, if some C extensions badly leaks memory with dicts heapg/heapu might find a lot of untraversable-from-root dicts and try to classify them, but that uses the GC mechanism, so these leakages would not be found by hv_get_objects, but gc_get_objects would. I'd argue gc_get_objects is an improvement in this case.

zhuyifei1999 commented 5 years ago

After this patch:

Removed build tracker '/tmp/pip-req-tracker-0d7qarcz'
Partition of a set of 5000 objects. Total size = 1240000 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   5000 100  1240000 100   1240000 100 dict (no owner)

real    0m0.077s
user    0m0.068s
sys 0m0.009s

At c1fb662:

Partition of a set of 5000 objects. Total size = 1240000 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   5000 100  1240000 100   1240000 100 dict (no owner)

real    0m3.858s
user    0m3.851s
sys 0m0.007s

At 1b9f5d3:

Partition of a set of 5000 objects. Total size = 1240000 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   5000 100  1240000 100   1240000 100 dict (no owner)

real    0m0.099s
user    0m0.089s
sys 0m0.010s

Light speed

svenil commented 5 years ago

Cool. I can confirm that it is fast here too but your system seems to be faster, whatever the reason is. Removing the initial import of apport priming in View.py I get: 0.124s user time vs your 0.068s which is 1.82 times slower.

I have an Intel® Core™ i5-3210M CPU @ 2.50GHz × 4 OS: 4.4.0-154-generic #181-Ubuntu SMP (32 bit)

Thanks for the fix! I will incorporate it in guppy-pe although I don't know if cherry picking would work, I don't know how that works so I may just copy the file.

zhuyifei1999 commented 5 years ago

I have an Intel® Core™ i5-3210M CPU @ 2.50GHz × 4

Not sure how that works, but I have: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz x 8 Linux 5.2.13 #7 SMP x86_64

Thanks for the fix! I will incorporate it in guppy-pe although I don't know if cherry picking would work, I don't know how that works so I may just copy the file.

We kinda have different indentation styles ;) I could send you a patch, if you would like to keep the mix of tabs and spaces. (Though, Python 3 doesn't support it for Python code)

zhuyifei1999 commented 5 years ago

I just tested on a server that is

Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz x 32 Linux 4.15.0-55-generic #60-Ubuntu SMP x86_64

$ time python dictofno.py 
Partition of a set of 5000 objects. Total size = 1200000 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   5000 100  1200000 100   1200000 100 dict (no owner)

real    0m0.075s
user    0m0.063s
sys 0m0.012s

zhuyifei1999 / guppy3

Performance hit when classifying many dicts (non-GC-tracked and with no owner) #8