matthewwithanm / django-imagekit

Automated image processing for Django. Currently v4.0
http://django-imagekit.rtfd.org/
BSD 3-Clause "New" or "Revised" License
2.26k stars 276 forks source link

Leaking open files #429

Closed tolomea closed 6 years ago

tolomea commented 7 years ago

Python 3.6.1 Django==1.11.3 django-imagekit==4.0.1

When using FileSystemStorage Imagekit leaks open file handles. In specs/init.py:ImageSpec.generate the open_image call inside the try may result in an open call on the underlying file system and that open is never closed. My experimenting shows it can leak multiple file handles from here and the leak persists across requests, I don't know if they are ever cleaned up.

Here's a chunk of stack showing the actual open if that helps any:

File "C:\Users\tolomea\venv\lib\site-packages\imagekit\specs\__init__.py", line 150, in generate
  img = open_image(self.source)
File "C:\Users\tolomea\venv\lib\site-packages\pilkit\utils.py", line 21, in open_image
  target.seek(0)
File "C:\Users\tolomea\venv\lib\site-packages\django\core\files\utils.py", line 20, in <lambda>
  seek = property(lambda self: self.file.seek)
File "C:\Users\tolomea\venv\lib\site-packages\django\db\models\fields\files.py", line 51, in _get_file
  self._file = self.storage.open(self.name, 'rb')
File "C:\Users\tolomea\venv\lib\site-packages\django\core\files\storage.py", line 38, in open
  return self._open(name, mode)
File "C:\Users\tolomea\venv\lib\site-packages\django\core\files\storage.py", line 300, in _open
  return File(open(self.path(name), mode))

As you can see this was found on a Windows system, I've no reason to think it's specific to the Windows system, but it is more noticeable there as Windows is fussy about not letting you delete(etc) open files.

This patch fixes it for my current usecase but I don't know if it breaks anything else:

patchy.patch(imagekit.specs.ImageSpec.generate, """\
    @@ -3,7 +3,6 @@
             raise MissingSource("The spec '%s' has no source file associated"
                                 " with it." % self)

    -    file_opened_locally = False
         # TODO: Move into a generator base class
         # TODO: Factor out a generate_image function so you can create a generator and only override the PIL.Image creating part. (The tricky part is how to deal with original_format since generator base class won't have one.)
         try:
    @@ -12,12 +11,10 @@

             # Re-open the file -- https://code.djangoproject.com/ticket/13750
             self.source.open()
    -        file_opened_locally = True
             img = open_image(self.source)

         new_image =  process_image(img, processors=self.processors,
                                    format=self.format, autoconvert=self.autoconvert,
                                    options=self.options)
    -    if file_opened_locally:
    -        self.source.close()
    +    self.source.close()
         return new_image
""")

In terms of reproduction I've been using this monkey patch to keep track of open files in the system and where they came from

from weakref import WeakKeyDictionary
import builtins
import traceback
open_files = WeakKeyDictionary()
old_open = builtins.open

def new_open(*args, **kwargs):
    res = old_open(*args, **kwargs)
    open_files[res] = traceback.format_stack()
    return res

builtins.open = new_open

def print_open_files():
    print("PRINTING OPEN <<<")
    for k, v in open_files.items():
        if not k.closed:
            print(id(k), k)
            print("".join(v[-10:]))
            print("=" * 10)
    print(">>>")

I have a model with an image field and a couple of specs:

original_image = models.ImageField(null=True)

thumbnail_image = ImageSpecField(
    source='original_image',
    processors=[SmartResize(200, 200)],
    format='JPEG',
    options={'quality': 60},
)

small_image = ImageSpecField(
    source='original_image',
    processors=[SmartResize(310, 310)],
    format='JPEG',
    options={'quality': 90},
)

large_image = ImageSpecField(
    source='original_image',
    processors=[SmartResize(550, 550)],
    format='JPEG',
    options={'quality': 90},
)

One of which is displayed in the admin:

_thumbnail_image = AdminThumbnail(image_field='thumbnail_image')

Uploading successive files to it via the admin causes print_open_files to show files from previous requests still being open.

And applying the patch above fixes it.

vstoykov commented 7 years ago

Thanks for reporting the issue and investigation you made. I also saw that there are some situations where files handlers can leak but didn't investigated further yet, because of lack of time.

Proposed approach can lead to errors like Operation on closed file or something like that.

If you have time I will be really glad if you can try to come up with proper fix and prepare a pull request.

P.S. the issue is more serious on Python 3 because this will lead to memory leak because of ResourceWarning which is fired every time when Python garbage collect unclosed files (which will happen after some time). In Python 2 there is no such warning and after few requests the files are garbage collected and there is no leak. More info here https://bugs.python.org/issue27535