mfherbst / down-frab-videos

Download videos and lecture attachments from CCC events
GNU General Public License v3.0
11 stars 3 forks source link

'ascii' codec can't decode byte #1

Closed JanX2 closed 7 years ago

JanX2 commented 7 years ago

These 33c3 IDs result in the following error:

8018
8037
Traceback (most recent call last):
  File "/Users/jan/Development/Projects/down-frab-videos/down-frab-videos.py", line 657, in <module>
    downloader.download(talkid)
  File "/Users/jan/Development/Projects/down-frab-videos/down-frab-videos.py", line 384, in download
    f.write( self.info_text(talkid).encode('utf8'))
  File "/Users/jan/Development/Projects/down-frab-videos/down-frab-videos.py", line 357, in info_text
    ret += ("  - {0:" + str(maxlength) + "s}   {1}\n").format(link['title'],link['url'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 7: ordinal not in range(128)

Tried to resolves this, but apparently, my python foo is not strong enough.

mfherbst commented 7 years ago

Hey I'm on the train now, so cannot make lots of tests (slow net). But some things I noticed:

Maybe it is also an issue with encoding of the json that I download from the Fahrplan using requests. I'll look into that now.

JanX2 commented 7 years ago

The encoding of the json was the first thing I wanted to check, but I couldn’t trace exactly what was going on due to aid lack of Python foo.

I added the .encode('utf8') to fix issues with info_text() results containing Unicode (ordinal not in range(128) again).

For debugging purposes, you can try just downloading --format subtitles when in Edgeland. BTW: Apparently, the Internet is actually usable on ICE now… as long as you are not passing through a tunnel. ;)

mfherbst commented 7 years ago

So requests seems ok, since it automatically guesses an encoding and (at least on my side) makes a nice unicode string, which is used the whole time until we get to the point that we write it. I now changed the write command to

# write info page:
with open(folder+"/info_"+str(talkid)+".txt","w",encoding="utf-8") as f:
     f.write( self.info_text(talkid))

which should enforce utf-8 and hence get around the issue. I cannot (and could) never reproduce your problem, so I'm not sure this fixes it. If not, I guess we'll have to do it in person on your mac at some point.

JanX2 commented 7 years ago

Looks like OS X 10.11.6 is using Python 2.7.12, which doesn’t support open() with encoding. sad panda.

mfherbst commented 7 years ago

Jautsh. I have pretty much no idea about python 2.7, but since you can explicitly .encode() we might get away with

# write info page:
with open(folder+"/info_"+str(talkid)+".txt","wb") as f:
     f.write( self.info_text(talkid).encode('utf8'))

(Note the "wb" string, which means a binary writer) Let me know if it works ...

mfherbst commented 7 years ago

Closed because Python < 3.5 not officially supported.