markubiak / wallpaper-reddit

Downloads and sets wallpapers pulled from reddit.com
GNU General Public License v3.0
113 stars 42 forks source link

UnicodeEncodeError encountered upon image title encode/decoding #12

Closed clotifoth closed 8 years ago

clotifoth commented 8 years ago
(wily)clotifoth@localhost:~/Scripts$ wallpaper-reddit --random -f
searching for valid images...
downloading http://i.imgur.com/srVjTsG.jpg
Traceback (most recent call last):
  File "/usr/local/bin/wallpaper-reddit", line 9, in <module>
    load_entry_point('wallpaper-reddit==3.0.0', 'console_scripts', 'wallpaper-reddit')()
  File "/usr/local/lib/python3.4/dist-packages/wallpaper_reddit-3.0.0-py3.4.egg/wpreddit/main.py", line 29, in run
  File "/usr/local/lib/python3.4/dist-packages/wallpaper_reddit-3.0.0-py3.4.egg/wpreddit/download.py", line 73, in save_info
UnicodeEncodeError: 'ascii' codec can't encode characters in position 55-56: ordinal not in range(128)

This looks like comes back to this call at line 69 def save_info, download.py

    title = title.encode('utf-8').decode('unicode-escape')

The image proceeded to download and set as wallpaper correctly, but title.txt was completely empty.

It looks like this was the image in question: https://www.reddit.com/r/EarthPorn/duplicates/458b9a/strokkur_geyser_iceland_the_moment_of_eruption_oc/

I figured perhaps it was an Icelandic character getting dropped but the title here doesn't appear to have any of those...

markubiak commented 8 years ago

This is very interesting. Reddit's encoding has been problematic in the past. I'll try to see if I can reproduce this.

markubiak commented 8 years ago

I reproduced the bug. I can confirm that, at least for that image, removing that line that you listed fixes the issue. I added that line because many titles had issues in the past. I'm going to test it without the re-decoding for a couple of days to see if it works alright.

clotifoth commented 8 years ago

I'm not quite sure what issue the line was fixing in the first place. What originally happened? Maybe there's a better way to fix it.

markubiak commented 8 years ago

I removed the line in the latest commit. I think the original reason I did this was because ImageMagick (the old library I used for image processing) didn't like reddit's text encoding at all. Now that I've migrated over to PIL, there should be no use for that line.