pygobject / pgi

[Unmaintained: Use PyGObject instead] GTK+ / GObject Introspection Bindings for PyPy.
GNU Lesser General Public License v2.1
74 stars 16 forks source link

Memory leak #8

Open pwaller opened 10 years ago

pwaller commented 10 years ago

This demo has a memory usage which scales with N:

from pgi.repository import Poppler
doc = Poppler.Document.new_from_file(url, '')
N = 100000
for i in xrange(N):
    if i % 10000 == 0:
        print len(gc.get_objects())
    p = doc.get_page(0)

Calling g_free on p._obj causes a double free, so the problem is python-side. The number of alive objects grows by 4 per iteration.

lazka commented 10 years ago

Two problems:

pwaller commented 10 years ago

Is this something you're attacking or shall I have a go?

lazka commented 10 years ago

Go ahead.

I'd guess using weakrefs instead of __del__ should fix it.

Like cffi's "gc(cdata, destructor)" https://bitbucket.org/cffi/cffi/src/af4e381b5e99c27c466377145a84eeece6e5c199/cffi/gc_weakref.py?at=default

I gave you commit rights btw, so you should be able to push directly.

pwaller commented 10 years ago

It also appears that .unref() isn't being called in addition. Not sure if this is a bug introduced with #10. Trying to understand why.

pwaller commented 10 years ago

I'm looking at the unpack_return code for Object.

It calls object.__new__(Poppler.Page) and sets its _ref, but I don't see any sign of garbage tracking.

Should we add an UnrefFinalizer.track() on the resulting object?

pwaller commented 10 years ago

@lazka -- this is the sort of thing I've done which seems to do the right thing.

However, I get the impression that you intended for this to already work so I don't know if my solution is in the spirit of your other code. I've not made a pull request for the linked commit yet, it perhaps belongs on top of #10 if you are happy with the approach.

pwaller commented 10 years ago

This is the code for track_and_unref which is called by the code in the above link.

pwaller commented 9 years ago

Tidying up my personal issues list, so closing this. Please create a new issue if you're still interested in tracking it.

pwaller commented 9 years ago

I'm still hitting this problem. Not sure what a clean solution is, advice welcomed!

This demonstrates the problem:

from pgi.repository import Poppler as poppler
doc = poppler.Document.new_from_file("file://test.pdf", "")
for i in range(doc.get_n_pages()):
    p = doc.get_page(i)
    # p.unref()
# doc.unref()

If I call .unref(), the problem goes away.

So I'd like to determine if the unref() can be automated, or if it supposed to be automatic why it currently isn't.

pwaller commented 8 years ago

Ping. I'd like to close this (preferably with a resolution), any advice?

AmitANetskope commented 2 years ago

Observing same issue while using Gsf, it is leading to file descriptor leak -

Python 3.8.7 (default, Dec 21 2020, 21:23:03)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pgi
>>> pgi.install_as_gi()
>>> from gi.repository import Gsf
<stdin>:1: PyGIWarning: Gsf was imported without specifying a version first. Use gi.require_version('Gsf', '1') before import to ensure that the right version gets loaded.
>>> _gsf_inputstdio = Gsf.InputStdio.new("test_file")
>>> _gsf_infilemsole = Gsf.InfileMSOle.new(_gsf_inputstdio)
>>>
>>> _gsf_infilemsole.unref()
>>> _gsf_inputstdio.unref()

Calling unref releases the file descriptor. (Deleting the objects doesn't help)