sybrenstuvel / flickrapi

Python Flickr API implementation
https://stuvel.eu/flickrapi
Other
155 stars 33 forks source link

502 error printout breaks console interface #116

Open newpro opened 6 years ago

newpro commented 6 years ago

Hey @sybrenstuvel Thanks so much for the repo! It really saves me a lot of time in computer vision research.

The flickr server sometimes gives 502 error, even through it is very rare. My strategy currently include catch the error, and do an exponential backoff, wait for the flickr server to recover. The strategy works very well, however, in some cases, the library print out 502 error message payload, which is 502 webpage, and break the console, most likely special characters causes the program into memory space that should not be accessed. The program seems to be still running and collecting data, however, can not print further messages to monitor the progress. I attached a screenshot of the symptoms for reference.

If I may provide some recommendations to fix the issue, the easy way would be disabling print out the payload, or specifically filter out special characters that cause the console to break.

Thanks again!

Head of the message: screenshot from 2018-08-10 13-36-34

Tail of the message:

screenshot from 2018-08-10 13-22-20

sybrenstuvel commented 6 years ago

special characters causes the program into memory space that should not be accessed

There is no such thing as "special characters". If you're dealing with text, your software should know the encoding it is in and handle that properly. Just assuming it's a single-byte encoding is a bad idea, especially since the Flickr API documentation pretty much screams that everything is UTF-8. Ignoring character encoding will always turn around to bite you.

If I may provide some recommendations to fix the issue, the easy way would be disabling print out the payload, or specifically filter out special characters that cause the console to break.

AFAIK the library doesn't print() anything. All logging output goes via Python's logging module, which you can configure in your application. You can make it completely quiet, log to automatically rotated logfiles, and more.

newpro commented 6 years ago

Hey @sybrenstuvel Thank for the quick reply! I really appreciated.

I did some further digging, I still think there is a program logging problem in the repo code, specificly in this line. First, let me say that I did not feed non-UTF text into the API interface. The issue is in response payload from Flickr. With that in mind, I believe the possible error is not lie within logging, but "urllib_parse.unquote". Let me explain with a fun experiment:

Here are the experiment: screenshot from 2018-08-13 14-59-23

Cheers!

sybrenstuvel commented 6 years ago

Please don't screenshot your code. Just use Markdown to format it properly. That will allow me to copy-paste whatever you did and try it myself, instead of having to type everything myself.

Your use of the urlparse module indicates you're indeed using Python 2. What is your reason to stick to that ancient version? It's horrible when it comes to character encoding, and as a result I see mistakes even in your latest experiment (you're talking about u'\xc3' and '\xc3' as the same thing; they aren't).

newpro commented 6 years ago

hey @sybrenstuvel

Yeah you are right. This is an issue relative to python2. However, my original screenshot is running within python3.5. I was doing a quick test with my laptop on my way out when I submit the last post, so the issue is still there, just i did not get the right one.

I dig a bit further and try to replicate the issue. So the problem is about display this page. However, I tried to google the specific html code for this page trying to load the webpage again, I failed to find any. And also because the server issue are rare, I can not replicate it by send request.

However, I looked into it, and believe that it breaks the code when it is at displaying Korea. So I downloaded a html source code of offical Korea Tourism website to get some Korea byte string. Now we can successfully locate the issue:

import logging
from urllib import parse as urllib_parse
# the following line should freeze your console, or python interface, if not let me know
logging.error(urllib_parse.unquote('\xeb\xac\xb4\xeb\x8b\xa8\xec\x88\x98\xec\xa7\x91\xea\xb1\xb0\xeb\xb6\x80</a></li>\n'))
oPromessa commented 6 years ago

Some info I hope it helps...

  1. On occasion I get the 'bad panda' 502 error... mostly under heavy load. I have logging enabled to file and console and have not noticed this console locking issue you mention. I use both python 2.7 and python 3.6 with unicode.
  2. I've quickly tried your sample code on Windows bash with python 3.4 (will try it on Linux later on) and console seems not to lock.
    $ python3.4
    Python 3.4.3 (default, Nov 17 2016, 01:08:31)
    [GCC 4.8.4] on linux
    Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> from urllib import parse as urllib_parse
>>> logging.error(urllib_parse.unquote('\xeb\xac\xb4\xeb\x8b\xa8\xec\x88\x98\xec\xa7\x91\xea\xb1\xb0\xeb\xb6\x80</a></li>\n'))
ERROR:root:무단수집거부</a></li>

>>> print('still here')
still here
>>>

hope it helps

newpro commented 6 years ago

@oPromessa

I am using linux 16.04 LTS, python 3.6. I guess it may contribute to the current program stack in memory, and the OS ability to stop program reading into, or stream out to invalid memory. The code breaks in mine, screenshot: screenshot from 2018-08-14 17-42-26

The issue can be resolved in my system, by decode to UTF-8 before pass into unquote, e.g.,

logging.error(urllib_parse.unquote(b'\xeb\xac\xb4\xeb\x8b\xa8\xec\x88\x98\xec\xa7\x91\xea\xb1\xb0\xeb\xb6\x80</a></li>\n'.decode("utf-8", "strict")))

Observe: image

sybrenstuvel commented 6 years ago

Why are you unquoting a string that clearly isn't URL-encoded at all?

newpro commented 6 years ago

hey @sybrenstuvel

I got confused about that part 2. If the error is generated at this line, it is a mistake to use unquote function. The function should parse a url string, not request text.

oPromessa commented 6 years ago

@newpro just trying to help out. Would you mind going back to the beginning? I have a wild guess that the console/shell might not have the appropriate locale settings and may be getting confused!

  1. Can you share the environment variables on the shell which launches your app? I'm guessing some LANG/Collation related settings may be the cause of the conflict.
  2. Could you link to your code where you set the logging and where you get this situation.
    • Side notes 1)...
    • I was forced on my app launch shell to set things like this to cover my bases.
      # I've used this setting to allow support for international characters in
      # folders and file names
      export LC_ALL=en_US.utf8
      export LANG=en_US.utf8
    • Side notes 2)
    • My train of thought is that with incorrect locale you get different outputs...
      $ echo $LANG
      en_US.UTF-8
      $ find . -type d
      .
      ./Test Photo Library/Várias Pics
      $ LANG=en_US find . -type d
      .
      ./Test Photo Library/V??rias Pics
      $