openzim / python-libzim

Libzim binding for Python: read/write ZIM files in Python
https://pypi.org/project/libzim/
GNU General Public License v3.0
62 stars 20 forks source link

ZimBlob must keep a reference on the content. #12

Closed mgautierfr closed 4 years ago

mgautierfr commented 4 years ago

See https://github.com/openzim/python-libzim/pull/3#discussion_r410445830

jdcaballerov commented 4 years ago

I added this commit https://github.com/openzim/python-libzim/pull/3/commits/906876188bc542b49e5ab41ace2a50ff6f0fd8d3

Do you think it's still not enough to avoid gc ?

mgautierfr commented 4 years ago

You added a reference to ZimBlob in the article. But you also need to keep a reference to the content itself in the ZimBlob.

jdcaballerov commented 4 years ago

Do you agree ?

cdef class ZimBlob:
    cdef Blob* c_blob
    cdef object ref_content

    def __cinit__(self, content):

        if isinstance(content, str):
            self.ref_content = content.encode('UTF-8')
        else:
            self.ref_content = content

        self.c_blob = new Blob(<char *> self.ref_content, len(self.ref_content))

    def __dealloc__(self):
        if self.c_blob != NULL:
            del self.c_blob

Or should I retain only the address of that content. If the content is encoded I guess a copy will be made.

mgautierfr commented 4 years ago

Or should I retain only the address of that content. If the content is encoded I guess a copy will be made.

What need to be done in incrementing the reference on ref_content to be sure that gc will not delete it. "Storing" ref_content in self is incrementing the reference. (Without copy, you are making a new reference).

But it is fix with my commit https://github.com/openzim/python-libzim/pull/22/commits/dee124f878a78c6961722f03a5d8593943780c0d of the PR #22 (The same way of your commit https://github.com/openzim/python-libzim/commit/1133c846c79f45749088123f925958b8f4fb02c3 but not merged because on branch cython-blend)