python-pillow / Pillow

Python Imaging Library (Fork)
https://python-pillow.org
Other
12.24k stars 2.23k forks source link

Memory Leak, possibly in thumbnail #5180

Closed Jwink3101 closed 2 years ago

Jwink3101 commented 3 years ago

What did you do?

Make a lot of thumbnails in a long-running web app

What did you expect to happen?

Memory to be released if not immediately, upon an explicit gc.collect() call.

What actually happened?

...it was not

What are your OS, Python and Pillow versions?

For the record, I did see the thing about uploading an image but this happens on any jpeg I've tried so I am not uploading. If it'll really help, I can do it See first comment

import PIL,sys
print('PIL.__version__',PIL.__version__)
print('sys.version',sys.version)
from PIL import Image
import os,subprocess
import gc

TEST = 'test.jpg'
size = 1000,1000

def rss():
    cmd =['ps', '-p', f'{os.getpid()}', '-o', 'rss']
    mem = subprocess.check_output(cmd).decode().split('\n')[1].split()[0]
    return int(mem)

print('Init',rss())

N = 10
for ii in range(N):
    # Just in case an existing one matters
    try:
        os.unlink('thumb.jpg')
    except OSError:
        pass

    with Image.open(TEST) as img:
        img.thumbnail(size)
        img.save('thumb.jpg',quality=60)

    # img.close() I would assume this comes with the context manager. Tried commented and uncomented
    del img # just to make sure

print('post',rss())

gc.collect()
print('gc',rss())

Output varies a bit by each run but in general:

PIL.__version__ 8.1.0
sys.version 3.8.3 (default, Jul  2 2020, 11:26:31)
[Clang 10.0.0 ]
Init 11016
post 67268
gc 67268

Possibly Related

Thanks!

Jwink3101 commented 3 years ago

I just saw I can attach an image:

test

radarhere commented 3 years ago

Hi. If I run this modified version of your code on my system,

from PIL import Image
import os, gc, subprocess

TEST = 'memorytest.jpg'
size = 1000, 1000

def rss():
    cmd =['ps', '-p', f'{os.getpid()}', '-o', 'rss']
    mem = subprocess.check_output(cmd).decode().split('\n')[1].split()[0]
    return int(mem)

N = 100
for ii in range(N):
    with Image.open(TEST) as img:
        img.thumbnail(size)
        img.save('thumb.jpg',quality=60)
    if ii == 0:
        print('after 1 loop',rss())

print('after 99 more loops',rss())

I get

after 1 loop 44420
after 99 more loops 44396

So while this doesn't address your question about why 'post' and 'gc' are greater than 'Init', I would at least conclude that memory is not leaking with each loop.

Jwink3101 commented 3 years ago

Interesting. I guess this example isn't showing it but in my full app, continued calls absolutely add a lot to the RSS climbing considerably. (It's part of a process upload files loop in a Bottle application). The usage goes crazy.

As a dirty hack, I call a subprocess python executable with the three lines compressed. Much slower and dirtier but it's what I'm doing for now.

wiredfool commented 3 years ago

I'd be pretty surprised if there are old, large memory leaks in Pillow. Small, yes, new, could be. Especially for the basic stuff, we’ve run this in valgrind and stamped most of them out.

So some things to look at:

Also Note that we use a block allocator that hangs on to large chunks of allocated memory, so that for successive images we don’t have to keep going to the os for fresh memory. This shouldn’t affect what you’re seeing, but it does mean that Pillow will hold on to some allocated image memory after the last image is freed. But critically, it will not change the peak usage.

Jwink3101 commented 3 years ago

when did this change? Is is new, or did you just notice it?

I've used something like this before but it was in a multi-processing loop so it probably cleared that way. Due to how the fork method was deprecated on macOS, I moved away from this for the new version. So I don't know when it changed

What is the rate of loss? In bytes per iteration or fractions of the image size per iteration.

I will have to get back to you but every time I upload images to my app, it grows a good bit. I can (and almost certainly do) have other memory leaks but when I do something as simple as taking out the thumbnail generation, nothing grows much if at all.

Try running your app in valgrind/massif in test.

I know very little about this but I'll look into that. With that said, is the ~50mb growth as seen in the simple test expected?

mrabarnett commented 3 years ago

I let it generate the thumbnail >1000 times and saw no sustained increase in memory usage. (Windows 10)

radarhere commented 2 years ago

Interesting. I guess this example isn't showing it but in my full app, continued calls absolutely add a lot to the RSS climbing considerably.

So does the code you've posted, run by itself in Python, actually show the problem on your machine?

radarhere commented 2 years ago

With that said, is the ~50mb growth as seen in the simple test expected?

Your image is 4032 pixels wide and 3024 pixels high. For RGB, that is stored with a pixelsize of 4.

4032 3024 4 / 1024 / 1024 ~= 46.5

So about 50mb.

radarhere commented 2 years ago

Closing, as questions have been answered, and the example provided may not even reproduce the problem.

This can be re-opened if the problem can be demonstrated using just Pillow.