rmmh / skybot

Python IRC bot
https://github.com/rmmh/skybot/wiki
The Unlicense
250 stars 171 forks source link

Weather plugin not happy with unicode #136

Closed andyeff closed 5 years ago

andyeff commented 9 years ago

I'm having problems figuring out how best to address the issue of Wunderground returning data that triggers a UnicodeDecodeError. The swedish town of 'umea' seems to be a good test case.

".we umea" triggers the following traceback:

Unhandled exception in thread started by <function run at 0x7f56a5a21b90>
Traceback (most recent call last):
  File "core/main.py", line 78, in run
    out = func(input.inp, **kw)
  File "plugins/weather.py", line 71, in weather
    parsed_json = http.get_json(url)
  File "plugins/util/http.py", line 42, in get_json
    return json.loads(get(*args, **kwargs))
  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe5 in position 3: unexpected end of data

Unicode still confuses me and makes me question my purpose in the universe, so I don't know whether this is an issue with the JSON that wunderground returns, or whether the get_json function just doesn't like dealing with non-ASCII. I wouldn't mind trying to fix it myself and putting in a pull request, but I'm really unsure about how to edit rmmh's existing url-parsing functions without breaking everything. Anyone got a suggestion where I should start?

sklnd commented 8 years ago

The issue here is the wunderground API is returning some latin-1 encoded characters in the data. That's not really expected by python's json parser.

I put in a bit of a hack to try and handle that by converting to unicode from latin-1.