python-pillow / Pillow

Python Imaging Library (Fork)
https://python-pillow.org
Other
12.1k stars 2.21k forks source link

Memory leaks in Image.close() after saving as pdf or png #6448

Closed mansuf closed 2 years ago

mansuf commented 2 years ago

What did you do?

I'm converting 16 PNG images (with resolution 3000x4000) to PDF file and since Pillow didn't close the images after converting, it can cause huge memory. So i modified it's code from https://github.com/python-pillow/Pillow/blob/9.2.0/src/PIL/PdfImagePlugin.py#L218-L227, it looks like this

            existing_pdf.write_obj(contents_refs[page_number], stream=page_contents)

            page_number += 1

        # Close image after converting to save some memory
        im_sequence.close() # <--- This is the modified code

    #
    # trailer
    existing_pdf.write_xref_and_trailer()
    if hasattr(fp, "flush"):
        fp.flush()
    existing_pdf.close()

What did you expect to happen?

The memory stay low during converting

What actually happened?

Usually it worked for Pillow 9.0.1 and the memory stays between 20-60 MB. But after installing version 9.2.0, 9.1.0, and 9.1.1 it increased up to 700+ MB

Take a look at this footages

Pillow 9.0.1 https://user-images.githubusercontent.com/43638783/179394112-0aa70a49-b7e1-42e5-8dda-0306adc6d180.mp4
Pillow 9.1.0 https://user-images.githubusercontent.com/43638783/179394412-abc0d24f-f09a-414e-8b5c-f542c558ba75.mp4
Pillow 9.1.1 https://user-images.githubusercontent.com/43638783/179394649-84cdaaa4-85a8-41c5-ad15-c2c0198fe045.mp4
Pillow 9.2.0 https://user-images.githubusercontent.com/43638783/179395101-d9a77ae8-dc18-4e23-a7f0-b14064f85f2b.mp4

What are your OS, Python and Pillow versions?

import time
from pathlib import Path
from glob import glob
from PIL import Image

base_path = Path('./test_images')
pdf_file = Path('test.pdf')

if pdf_file.exists():
    pdf_file.unlink()

raw_files = glob('*', root_dir=base_path)

files = [Image.open(base_path / i) for i in raw_files]

im = files.pop(0)

print('Converting')
im.save(pdf_file, save_all=True, append_images=files)
print('Done converting')

time.sleep(9000)
radarhere commented 2 years ago

Could we have copies of the images? It would be helpful to remove as much ambiguity as possible from your example.

You changed the title to "Memory leaks in PIL.PngImagePlugin.PngImageFile.close". So you believe that this problem can be replicated without saving as PDFs and without using your Pillow modification?

radarhere commented 2 years ago

If you're concerned about memory when saving PDFs, you might also be interested in this method that uses less memory. If that works, it would also remove the need for your Pillow modification.

mansuf commented 2 years ago

Could we have copies of the images? It would be helpful to remove as much ambiguity as possible from your example.

Unfortunately, i cannot send you the images in this issue publicly, because of copyright reason. Is there a way that i can send images privately ?

You changed the title to "Memory leaks in PIL.PngImagePlugin.PngImageFile.close". So you believe that this problem can be replicated without saving as PDFs and without using your Pillow modification?

I assume that variable im_sequence is PIL.PngImagePlugin.PngImageFile. And no, the problem only happened in low-level function _save() in `PdfImagePlugin.py.

mansuf commented 2 years ago

If you're concerned about memory when saving PDFs, you might also be interested in https://github.com/python-pillow/Pillow/issues/4067#issuecomment-531517647 If that works, it would also remove the need for your Pillow modification.

I have tried this before. In my case, converting thousand of images to PDF file in single run can cause slow performance. I have looked at Pillow module and found that https://github.com/python-pillow/Pillow/blob/9.2.0/src/PIL/PdfImagePlugin.py#L112 is causing slow, because the PDF plugin is trying to load the whole contents of PDF file. So modifying Pillow module is best option so far.

mansuf commented 2 years ago

I cannot send you the original images that i used for tests, but this should do.

Source: https://commons.wikimedia.org/wiki/File:Lattinaichnusa.png

mansuf commented 2 years ago

After testing further with every extensions available (PIL.Image.EXTENSION). This is the result i found.

Image i used ![image](https://user-images.githubusercontent.com/43638783/180258829-86c65978-7f53-4946-bd98-f1daa746772a.png)

Not affected

Failed

Affected

radarhere commented 2 years ago

Thanks for the test image. Testing opening that image 16 times with your modification, I'm able to replicate the problem. The change in behaviour comes from https://github.com/python-pillow/Pillow/pull/6096/commits/bb9338e34d705501b87876ebe6a8212b88b96acb

I've created PR #6456 to resolve this. If you could test it and confirm, that would be helpful.

mansuf commented 2 years ago

Really confident that the PR resolve the problem, thank you very much 👍