mgedmin / objgraph

Visually explore Python object graphs
http://mg.pov.lt/objgraph/
MIT License
753 stars 72 forks source link

objgraph.show_growth() doesn't notice numpy arrays (and other non-GC-tracked types) #25

Open MilesQLi opened 8 years ago

MilesQLi commented 8 years ago

See this code:

import objgraph
import numpy as np
objgraph.show_growth()
j = 20
y = []
for i in range(5):
    for l in range(j):
        y.append(np.array([np.random.randint(500),np.random.randint(500)]))
    print 'i:',i
    objgraph.show_growth()
    print '___'
    #objgraph.show_most_common_types(limit=100)
    j += 1

the result is:

function                       2973     +2973
wrapper_descriptor             1584     +1584
builtin_function_or_method      873      +873
dict                            867      +867
method_descriptor               823      +823
weakref                         622      +622
tuple                           518      +518
getset_descriptor               514      +514
list                            422      +422
member_descriptor               223      +223
i: 0
wrapper_descriptor     1593        +9
getset_descriptor       518        +4
member_descriptor       226        +3
list                    424        +2
weakref                 624        +2
dict                    869        +2
listiterator              1        +1
___
i: 1
wrapper_descriptor     1596        +3
weakref                 625        +1
dict                    870        +1
method_descriptor       824        +1
___
i: 2
___
i: 3
___
i: 4
___

for the 2,3 and 4 epoch, it shows nothing growing. But it should show that the number of numpy.array grows

mgedmin commented 8 years ago

Unfortunately not all objects are tracked by the Python garbage collector. Specifically, "simple" objects that cannot contain references to other objects are never tracked -- these include things like str/int instances, as well as, apparently, numpy arrays.

This is actually documented, but the documentation is a bit vague: show_growth says

The caveats documented in typestats() apply.

and typestats says

Count the number of instances for each type tracked by the GC.

Note that the GC does not track simple objects like int or str.

Do you have any suggestions for improving the documentation so this isn't that big of a surprise?


It may be possible to write code to count these objects with some extra effort (see issue #2), but it's non-trivial (I'd need to eliminate duplicates without artificially inflating the number of dicts/strings used while eliminating duplicates), so I cannot promise anything within a reasonable amount of time.

MilesQLi commented 8 years ago

I think it is clear enough. You can't enumerate all the type that GC doesn't track. If you want you can add a function for users to determine whether a type is tracked, but I think it is more of users' own job.

mgedmin commented 8 years ago

If you want you can add a function for users to determine whether a type is tracked,

There's gc.istracked(), in Python 2.7 and 3.1+. Note the interesting fine print:

However, some type-specific optimizations can be present in order to suppress the garbage collector footprint of simple instances (e.g. dicts containing only atomic keys and values)