Closed duhaime closed 4 years ago
Thanks for your quick followup @rragundez! I had thought the same, but it seems closing the filehandler before returning the object forbids a number of operations on the returned object. Minimal example:
import numpy as np
from PIL import Image as pil_image
im = pil_image.open('cats.jpg')
im.close()
im.getdata()
That will throw ValueError: Operation on closed image
.
For what it's worth, the approach here is straight from the Pillow docs on file handling. That said, I'm happy to make any changes you see fit!
Thanks for your explanations. After having a closer look I think the root of the problem is the inconsistency load_img
can return only a pointer to the file if no data augmentation is done (keeps file open) and an actual image in memory if data augmentation is done (no open file). I think this inconsistency is gone by simply changing img = pil_image.open(path)
to img = pil_image.load(path)
in
https://github.com/keras-team/keras-preprocessing/blob/master/keras_preprocessing/image/utils.py#L112
The drawback is of course that for no augmentation the image object occupies more memory but since this function is used in the Iterator class in a loop where it gets immediately converted to an array then is OK.
I would do the change above and simply add a unit-test that load_img
is actually returning an image in memory and not a pointer to the file, with and without augmentation. What do you think?
@rragundez Your approach (with pil_image.load(path) as img
) is much cleaner, nice research. I just replaced the io.BytesIO
business with that approach, and added a test to validate that load_img
does indeed return an image in memory.
How does the updated PR look?
@rragundez ah I was testing against a cached version of the module--it appears the load method is missing from PIL.Image--where did you see that method?
It appears it acts over the Image
class objects
https://pillow.readthedocs.io/en/3.1.x/reference/Image.html?highlight=load#PIL.Image.Image.load
then something like pil_image.open().load()
I think is appropriate.
Hmm, these two return different objects:
pil_image.open('a.png') # returns `PIL.PngImagePlugin.PngImageFile` object
pil_image.open('a.png').load() # returns `PixelAccess` object
I understand the Pillow docs on the open
method to indicate that PIL.Image.open()
opens the file (and never closes it) but waits to read the pixel data into RAM. load()
evidently is a method that one can call to manually force the PIL.Image
object to yield up its pixel data.
That said, combining open and load don't seem to close the file handlers. Returning PixelAccess data rather than a PIL Image object also breaks some methods in keras_preprocessing
; e.g. you can't call img_to_array
on the PixelAccess data:
>>> from keras.preprocessing.image import img_to_array
Using TensorFlow backend.
>>> from PIL import Image as pil_image
>>> a = pil_image.open('a.png').load()
>>> img_to_array(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/doug/anaconda/envs/3.5/lib/python3.5/site-packages/keras/preprocessing/image.py", line 75, in img_to_array
return image.img_to_array(img, data_format=data_format, dtype=dtype)
File "/Users/doug/anaconda/envs/3.5/lib/python3.5/site-packages/keras_preprocessing/image/utils.py", line 299, in img_to_array
x = np.asarray(img, dtype=dtype)
File "/Users/doug/anaconda/envs/3.5/lib/python3.5/site-packages/numpy/core/numeric.py", line 538, in asarray
return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number, not 'PixelAccess'
It's entirely possible I misunderstood you or am missing something! From my perspective, using the io stream method outlined in the Pillow docs seems the best way forward...
@rragundez sounds good! I just submitted some updates with the changes you requested. The flake8 checks are of course going to fail because of the undefined unicode
in the text module. I factored the changes that rectify that problem into a separate PR per your request above.
I think is good now, I'll have a closer look tomorrow :)
thanks for your help with this!
No problem, I don't understand why I still see changes in keras_preprocessing/text.py
thanks for the work @duhaime!
Summary
The current
load_img
function opens files but never closes them, which can causeOSError: [Errno 24] Too many open files
if one reads in too many images in a process.Related Issues
https://github.com/keras-team/keras-preprocessing/issues/260
PR Overview