jeffdaily / parasail-python

Python bindings for the parasail C library.
Other
90 stars 17 forks source link

prevent premature deletion #64

Open maxbachmann opened 2 years ago

maxbachmann commented 2 years ago

The readme mentions:

Be careful using the attributes of the Result object - especially on Result instances constructed on the fly. For example, calling parasail.sw_trace("asdf", "asdf", 11, 1, parasail.blosum62).cigar.seq returns a numpy.ndarray that wraps a pointer to memory that is invalid because the Cigar is deallocated before the seq statement. You can avoid this problem by assigning Result instances to variables as in the example above.

This is really bad, since this is generally a very unexpected behaviour. The behavior can be prevented by adding another level of abstraction around pointers. Currently you have e.g. the following implementation:

class Result:
    def __init__(self, pointer, ...):
        self.pointer = pointer
    def __del__(self):
        _lib.parasail_result_free(self.pointer)
   def test(self):
       return object_depending_on_pointer(self.pointer)

this should be changed to something like this:

class _ResultPointer:
    def __init__(self, pointer, ...):
        self.pointer = pointer

    def __del__(self):
        _lib.parasail_result_free(self.pointer)

class Result:
    def __init__(self, pointer, ...):
        self.pointer = _ResultPointer(pointer)

   def test(self):
       return object_depending_on_pointer(self.pointer)

this abstraction around the pointer ensures that the memory will not be freed to early.

jeffdaily commented 2 years ago

Do you have a link to some documentation/gist/blog/stackoverflow for something like this? The current behavior is unfortunate and I agree, unexpected.

maxbachmann commented 2 years ago

Python destroys objects at some point after they are no longer referenced. In your case this means that when calling

parasail.sw_trace("asdf", "asdf", 11, 1, parasail.blosum62).cigar.seq

the object is destroyed after the call to

parasail.sw_trace("asdf", "asdf", 11, 1, parasail.blosum62).cigar

since the Result object returned by sw_trace is no longer referenced. In your case this means __del__ is called and you delete the memory even though you still need it inside the Cigar class.

To prevent this you need to make sure the object is still referenced in there, so __del__ is not called while the memory is still in use. This could be achieved by storing the object in the Cigar object:

class Cigar:
    def __init__(self, result):
        self._result = result

class Result:
    def __init__(self, pointer, ...):
        self.pointer = pointer
    def __del__(self):
        _lib.parasail_result_free(self.pointer)
   def test(self):
       return Cigar(self)

or if you do not want to pass the whole object around it can be achieved using an additional object which wraps the memory (essentially this is similar to something like a numpy array)

class Cigar:
    def __init__(self, pointer):
        self._pointer = pointer

class _ResultPointer:
    def __init__(self, pointer, ...):
        self.pointer = pointer

    def __del__(self):
        _lib.parasail_result_free(self.pointer)

class Result:
    def __init__(self, pointer, ...):
        self.pointer = _ResultPointer(pointer)

   def test(self):
       return Cigar(self.pointer)