mgedmin / objgraph

Visually explore Python object graphs
http://mg.pov.lt/objgraph/
MIT License
753 stars 72 forks source link

Why are new dictionaries not detected by show_growth()? #41

Closed greatvovan closed 5 years ago

greatvovan commented 5 years ago

I am trying basically:

>>> import objgraph as o
>>> o.show_growth()
...
>>> d = {1: 2}
>>> o.show_growth()
>>>

I have learned from the documentation that references to primitive types are not tracked, but I wonder why the reference to a dictionary (which is not an of primitive type?) is not counted? For example, if I define a list with only primitive element, it is tracked:

>>> l = [2]
>>> o.show_growth()
list      356        +1

but the same with a dictionary does not work?

mgedmin commented 5 years ago

I'm not entirely sure why, but

$ python
Python 2.7.15+ (default, Oct  2 2018, 22:12:08) 
[GCC 8.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gc
>>> gc.is_tracked([2])
True
>>> gc.is_tracked({1: 2})
False

(For some data types Python has the optimization where it decides whether to track a specific object or not depending on its contents. How it decides what to track or not is a great question, but not one I'm able to answer.)

mgedmin commented 5 years ago

Actually, I can guess: Python's GC needs to track objects if and only if they can participate in reference cycles.

Objects that have no references to other objects (such as ints or strings) are the most obvious example of things that can be untracked.

Objects such as dictionaries that refer only to untracked objects can also be untracked.

>>> d = {1: 2}
>>> d
{1: 2}
>>> gc.is_tracked(d)
False

If you modify the dictionary to add a reference to a tracked object, Python will flip the tracked bit at runtime:

>>> d[1] = 3
>>> gc.is_tracked(d)
False
>>> d[1] = [2]
>>> gc.is_tracked(d)
True
greatvovan commented 5 years ago

OK, thank you. Supposing my program may have a list or another data structure that (not intentionally) holds references to not tracked dictionaries (or other not tracked types), this prevents these referenced objects from collection by GC. Is there a way to find such references and detect growth in number of these references/objects?

mgedmin commented 5 years ago

this prevents these referenced objects from collection by GC.

That is not precisely accurate. Python has two ways to collect garbage:

Pure reference counting has trouble with data structures that contain reference cycles, for which the cyclic garbage collector was added back in Python 2.0 or so. The data structure you've described has no cycles, so it'll be collected as soon as the last reference to it goes away and Python decrements the corresponding reference counter.

greatvovan commented 5 years ago

I understand it will be collected immediately. But is there a way to do a thing like "collect all new objects since the previous snapshot" (both tracked and not tracked by GC). I see it is now out of the scope of your library but may be you have any suggestions?

mgedmin commented 5 years ago

Are you looking for https://docs.python.org/3/library/gc.html#gc.collect? Because if not, then I've no idea what to tell you ;)