mdbloice / Augmentor

Image augmentation library in Python for machine learning.
https://augmentor.readthedocs.io/en/stable
MIT License
5.07k stars 866 forks source link

bug in this line to convert RGB images into a 2D numpy matrix #202

Open yesufeng opened 5 years ago

yesufeng commented 5 years ago

https://github.com/mdbloice/Augmentor/blob/d7832c604f90fd34e300b97acad36625859d1c36/Augmentor/Pipeline.py#L521

This line has bug, np.asarray(Image.open(...)) convert the PngImageFile (PIL type) into a 2D instead of 3D array, e.g., the original png is a (224,224,3) then this converts it to (224,224), dropped the channel dimension. This is due to the PIL mode is set to 'P', which needs to be converted to 'RGB' before this conversion for RGB images. This makes the keras_generator not working with colored images.

mdbloice commented 5 years ago

Hi @yesufeng, thanks for pointing this out, however I cannot seem to recreate this when I test it. PIL doesn't seem to drop channels by default, and if I explicitly set a 'P' mode, I receive an error:

Image.open('./ISIC_0000000.jpg', mode='P')

which returns:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-76b00bb38e69> in <module>()
----> 1 Image.open('./ISIC_0000000.jpg', mode='P')

/usr/local/lib/python3.6/dist-packages/PIL/Image.py in open(fp, mode)
   2518 
   2519     if mode != "r":
-> 2520         raise ValueError("bad mode %r" % mode)
   2521 
   2522     exclusive_fp = False

ValueError: bad mode 'P'

If I do not set the mode argument, then it opens with 3 channels as expected.

Numpy also seems to open all three layers as expected:

np.shape(np.asarray(Image.open('./ISIC_0000000.jpg')))

outputs:

(767, 1022, 3)

Are you experiencing this issue with any data you use?

M.

yesufeng commented 5 years ago

Hey @mdbloice ,

Yes, so this is what I did to pinpoint the problem, I first initialized a pipeline by pointing it to a directory, then I followed the logic in the code, where the pipeline.augmentor_images saves all the image paths and then the self._executor will call Image.open to read the image as shown below, and this is essentially what line 521 in keras_generator does:

aug_im = pipeline.augmentor_images[10] processed_image = np.asarray(Image.open(aug_im.image_path))

And then if look at the processed_image, it is (448, 448) instead of (448, 448, 3) as I expect. The same input png I have verified with matplotlib.image, and it is (448, 448, 3) and can be read in with no problem.

yimjinkyu1 commented 4 years ago

Has the above problem been solved? I also met above problem. I used jpeg imagenet data.