olokelo / jxlpy

Cython bindings and Pillow plugin for JPEG XL
MIT License
45 stars 12 forks source link

Memory leak in decoding #12

Closed phiresky closed 2 years ago

phiresky commented 2 years ago

The following code causes the python process to use unbounded memory:

for i in range(10000):PIL.Image.open(BytesIO(img))

where img is a random JXL image.

I think the issue is that img._decoder.close() is never called.

for i in range(10000):
    img = PIL.Image.open(BytesIO(img))
    img._decoder.close()

works fine (barely any memory use)

phiresky commented 2 years ago

The following fixes the leak, but makes it segfault if you call .close() manually. I'm not sure if a close() method is necessary, seems like it should just always deallocate exactly when the python object is deallocated?

diff --git a/_jxlpy/_jxl.pyx b/_jxlpy/_jxl.pyx
index e4cf66f..bf19aec 100644
--- a/_jxlpy/_jxl.pyx
+++ b/_jxlpy/_jxl.pyx
@@ -1004,6 +1004,11 @@ cdef class JXLPyDecoder(object):

         return data_out.data()[:data_out.size()]

+    def __dealloc__(self):
+        if self.decoder != NULL:
+            JxlDecoderDestroy(self.decoder)
+        if self.runner != NULL:
+            JxlThreadParallelRunnerDestroy(self.runner)

     def close(self):
olokelo commented 2 years ago

Hi!

Thank you for reporting the issue and proposing the solution. I've tested your code to cause expected leak, but in my case it won't appear. I wrote a simple script to test it, could you please run it in your environment? Also could you tell me the version of your libjxl, Python, Pillow and what OS you're using?

from PIL import Image
from jxlpy import JXLImagePlugin
from io import BytesIO
import os, psutil
import sys

iters = int(sys.argv[1])

with open('img.jpg', 'rb') as f:
    img = f.read()

for i in range(iters):
    im = Image.open(BytesIO(img))
    #img._decoder.close()

process = psutil.Process(os.getpid())
print(process.memory_info().rss)
$ python3 test.py 1
21762048
$ python3 test.py 5
22163456
$ python3 test.py 50
22110208
$ python3 test.py 500
22163456
$ python3 test.py 5000
22061056
phiresky commented 2 years ago

Did you use a jpg or a jxl file for your test?

Using this test image: https://w2w-images.sftcdn.net/image/upload/v1620662834/Ghacks/IMG_20200308_194050.jxl With your test script:

❯ python x.py 1000
38481920

❯ python x.py 10000
156274688

❯ python x.py 100000
1334693888

❯ python x.py 500000
6572531712

After adding the im._decoder.close() call:

❯ python x.py 500000
25337856

System info:

$ python --version
Python 3.10.5
$ pacman -Qi libjxl
Name            : libjxl
Version         : 0.6.1-3
$ pip show pillow
Name: Pillow
Version: 9.1.1
$ lsb_release -a
LSB Version:    n/a
Distributor ID: Arch
Description:    Arch Linux
Release:        rolling
$ pip show jxlpy
Name: jxlpy
Version: 0.9.1
olokelo commented 2 years ago

Oh, yes. Now I see your point. The .close() function of decoder was meant to be called manually, but with __dealloc__ it seems to be working well. I will do some more tests and probably release a new version :)