python-pillow / Pillow

Python Imaging Library (Fork)
https://python-pillow.org
Other
12.24k stars 2.23k forks source link

The code works fine on interpreter, but not as part of a class #2112

Closed debuggerpk closed 8 years ago

debuggerpk commented 8 years ago

What did you do?

called the Image.open() inside the class, a piece of code that works when called on interpreter.

What did you expect to happen?

give me the size of the image

What actually happened?

raised an error. see below

What versions of Pillow and Python are you using?

python 2.7, pillow 3.3.1

The problem statement

The function below, part of a bigger class works fine on all the images except this one -

http://www.worldbank.org/content/dam/wbr/About/Pres/jyk-hs-offical.png

    def _fetch_image_size(self, image_url):
        size = None
        if '.svg' not in image_url:
            response = requests.get(image_url, headers=self._headers)
            if response.status_code == 200:
                response.raw.decode_content = True
                try:
                    image = Image.open(io.BytesIO(response.content))
                    size = image.size
                except (IOError, OSError) as error:
                    print error
                    print image_url
            response.close()
        return size

when the above function is called as part of the class object, it raises me this error.

cannot identify image file <_io.BytesIO object at 0x1062e1b30>

however on the command line interpreter, when i do this

    import io, requests
    from PIL import Image

    response = requests.get('http://www.worldbank.org/content/dam/wbr/About/Pres/jyk-hs-offical.png')
    response.raw.decode_content = True
    image = Image.open(io.BytesIO(response.content))
    print image.size

the output is,

(220, 220)

i am unable to figure out why is it happening? attaching the screenshot.

pillow error

I am not really sure whether to raise it here or anywhere else

hugovk commented 8 years ago

The problem's probably with your class.

It works for me (OS X, Python 2.7.11, Pillow 3.3.1) when calling this:

    import io, requests
    from PIL import Image

    response = requests.get('http://www.worldbank.org/content/dam/wbr/About/Pres/jyk-hs-offical.png')
    response.raw.decode_content = True
    image = Image.open(io.BytesIO(response.content))
    print image.size

And when calling the def:

def _fetch_image_size(image_url):
    size = None
    if '.svg' not in image_url:
        response = requests.get(image_url)
        if response.status_code == 200:
            response.raw.decode_content = True
            try:
                image = Image.open(io.BytesIO(response.content))
                size = image.size
            except (IOError, OSError) as error:
                print error
                print image_url
        response.close()
    return size

_fetch_image_size("http://www.worldbank.org/content/dam/wbr/About/Pres/jyk-hs-offical.png")

But note that's with headers=self._headers removed, and I don't know what you've got in there, and that could be causing problems.

Does it work for you with this def? What's your full class? Strip it down as far as possible to resemble your simpler calls, and see at which point the problem occurs. It'll be easier if you put each in a script and run them from there.

debuggerpk commented 8 years ago

thankyou for your response. I have diagnosed the problem. pasting the problematic code from my class here.

def _extract_image_urls(self, soup):
    """
    extracts all the <img src=''> tags

    Args:
        soup (obj): the BeautifulSoup object

    Returns:
        url (str): string for url
    """
    for img in soup.findAll("img", src=True):
        yield urlparse.urljoin(self._url, img["src"])

this above code gets me all the urls i need to pass onto my _fetch_image_size() function.

i modified my _fetch_image_size function to have this

.....
response = requests.get(image_url, headers=self._headers)
print 'Request URL: {url}'.format(url=image_url)
print 'Response URL: {url}'.format(url=response.url)
.....

and here is the response.

s = LinkScraper('http://www.worldbank.org/en/about/president/about-the-office/bio') Request URL: http://www.worldbank.org/content/dam/wbr/img/mobile-menu-lines.png Response URL: http://www.worldbank.org/content/dam/wbr/img/mobile-menu-lines.png

Request URL: http://www.worldbank.org/etc/designs/wbr/clientlibs/img/icon-search-black.png Response URL: http://www.worldbank.org/etc/designs/wbr/clientlibs/img/icon-search-black.png

Request URL: http://www.worldbank.org/content/dam/wbr/About/Pres/jyk-hs-offical.png Response URL: http://www.worldbank.org/404_response.htm Error: cannot identify image file <_io.BytesIO object at 0x112414830>

the response that is being passed into the PIL.Image function is an http response. Nothing wrong with PIL here. I need to sanitize my urls to look for blank spaces here maybe.

thankyou @hugovk for the response.