python-pillow / Pillow

Python Imaging Library (Fork)
https://python-pillow.org
Other
12.32k stars 2.24k forks source link

Memory not being freed with Image.fromarray #8549

Open jmspereira opened 1 week ago

jmspereira commented 1 week ago

What did you do?

Hey everyone, I have an application that uses pillow to encode numpy arrays as jpegs, however I am seeing a strange behavior regarding the memory usage of that application.

What did you expect to happen?

All allocated memory be freed.

What actually happened?

There is memory that is not freeded.

What are your OS, Python and Pillow versions?

--------------------------------------------------------------------
Pillow 11.0.0
Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
--------------------------------------------------------------------
--- PIL CORE support ok, compiled for 11.0.0
--- TKINTER support ok, loaded 8.6
--- FREETYPE2 support ok, loaded 2.13.2
--- LITTLECMS2 support ok, loaded 2.16
--- WEBP support ok, loaded 1.4.0
--- JPEG support ok, compiled for libjpeg-turbo 3.0.4
--- OPENJPEG (JPEG2000) support ok, loaded 2.5.2
--- ZLIB (PNG/ZIP) support ok, loaded 1.2.11
--- LIBTIFF support ok, loaded 4.6.0
--- RAQM (Bidirectional Text) support ok, loaded 0.10.1, fribidi 1.0.8, harfbuzz 10.0.1
*** LIBIMAGEQUANT (Quantization method) support not installed
--- XCB (X protocol) support ok
--------------------------------------------------------------------

Code that reproduces the problem:

import time
import numpy as np
from io import BytesIO
from PIL import Image

def open_pillow_image():
    random_image = (np.random.rand(720, 1280, 3) * 255).astype(np.uint8)

    with BytesIO() as output, Image.fromarray(random_image) as pillow_image:
        pillow_image.save(output, format="jpeg")

def main():
    print("before")
    ### Memory here is around 60mbs...
    time.sleep(10)
    open_pillow_image()

    ### Memory here is around 65mbs...
    print("after")
    time.sleep(1000)

if __name__ == '__main__':
    main()
Yay295 commented 1 week ago

Does anything change if you add

import gc
gc.collect()

after open_pillow_image()?

radarhere commented 1 week ago

https://github.com/python-pillow/Pillow/issues/7935#issuecomment-2031804237

Pillow's memory allocator doesn't necessarily release the memory in the pool back as soon as an image is destroyed, as it uses that memory pool for future allocations. See Storage.c (https://github.com/python-pillow/Pillow/blob/main/src/libImaging/Storage.c#L310) for the implementation.

jmspereira commented 1 week ago

@Yay295, calling the garbage collector explicitly does not make any difference.

@radarhere according to the documentation:

"There is now a memory pool to contain a supply of recently freed blocks, which can then be reused without going back to the OS for a fresh allocation. This caching of free blocks is currently disabled by default (...)" (https://pillow.readthedocs.io/en/stable/reference/block_allocator.html)

It appears that the caching of free blocks should be disabled by default, and tweaking with the PILLOW_BLOCKS_MAX as mentioned in the issue that you reference does not make any difference.

radarhere commented 1 week ago

I see, "caching of free blocks" refers to https://github.com/python-pillow/Pillow/blob/5bff2f3b2894ec6923c590d0c37b18177d0634bd/src/libImaging/Storage.c#L315-L338

By default, the following is used instead. https://github.com/python-pillow/Pillow/blob/5bff2f3b2894ec6923c590d0c37b18177d0634bd/src/libImaging/Storage.c#L339-L349

Testing further, I think the issue doesn't occur only when loading the array, but rather when saving.

radarhere commented 1 week ago

If I suggest that calling JpegImagePlugin directly improves the situation, do you agree?

from PIL import JpegImagePlugin
with BytesIO() as output, Image.fromarray(random_image) as pillow_image:
    pillow_image.encoderinfo = {}
    JpegImagePlugin._save(pillow_image, output, "filename")
jmspereira commented 1 week ago

Hum, It doesn't seem to make any difference

radarhere commented 1 week ago

Do you agree that saving is the problem? As in, I think this code should be fine.

with BytesIO() as output, Image.fromarray(random_image) as pillow_image:
    pass
jmspereira commented 1 week ago

Hum, I do not think so. If I run this:

import time
from io import BytesIO

import numpy as np
from PIL import Image

def open_pillow_image():
    random_image = (np.random.rand(720, 1280, 3) * 255).astype(np.uint8)

    with BytesIO() as output, Image.fromarray(random_image) as pillow_image:
        pass

def main():
    print("before")
    time.sleep(10)
    open_pillow_image()
    print("after")
    time.sleep(1000)

if __name__ == '__main__':
    main()

The memory used by the script is larger after opening the image.

radarhere commented 1 week ago

Just to be sure, if you remove Pillow, does the problem go away?

import time
from io import BytesIO

import numpy as np

def open_pillow_image():
    random_image = (np.random.rand(720, 1280, 3) * 255).astype(np.uint8)

    with BytesIO() as output:
        pass

def main():
    print("before")
    time.sleep(10)
    open_pillow_image()
    print("after")
    time.sleep(1000)

if __name__ == '__main__':
    main()
jmspereira commented 1 week ago

Yes, the problem does not exist without pillow.