ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.23k stars 9.93k forks source link

"TypeError: must be unicode, not str" when using --write-description #7178

Closed sebma closed 8 years ago

sebma commented 8 years ago

Hi, I have the following error :

"TypeError: must be unicode, not str"

The command I typed is the following :

youtube-dl --verbose --ignore-config --write-description http://www.veoh.com/watch/v46093745wbEGkakh [debug] System config: [] [debug] User config: [] [debug] Command-line args: [u'--verbose', u'--ignore-config', u'--write-description', u'http://www.veoh.com/watch/v46093745wbEGkakh'] [debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2015.10.13 [debug] Python version 2.7.8 - Linux-3.16.0-44-generic-x86_64-with-Ubuntu-14.10-utopic [debug] exe versions: avconv 11.2-6, avprobe 11.2-6, ffmpeg 2.6.2, ffprobe 2.6.2, rtmpdump 2.4 [debug] Proxy map: {} [Veoh] v46093745wbEGkakh: Downloading video XML [info] Writing video description to: Pirates Of Silicon Valley-46093745.description Traceback (most recent call last): File "/usr/local/bin/youtube-dl", line 11, in sys.exit(main()) File "/usr/local/lib/python2.7/dist-packages/youtube_dl/init.py", line 410, in main real_main(argv) File "/usr/local/lib/python2.7/dist-packages/youtube_dl/_init.py", line 400, in _real_main retcode = ydl.download(all_urls) File "/usr/local/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 1665, in download url, force_generic_extractor=self.params.get('force_generic_extractor', False)) File "/usr/local/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 671, in extract_info return self.process_ie_result(ie_result, download, extra_info) File "/usr/local/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 717, in process_ie_result return self.process_video_result(ie_result, download=download) File "/usr/local/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 1335, in process_video_result self.process_info(new_info) File "/usr/local/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 1465, in process_info descfile.write(info_dict['description']) TypeError: must be unicode, not str

Any idea ?

Seb.

gaming-hacker commented 8 years ago

there is probably some strange non-ascii characters in the string not being recognized

sebma commented 8 years ago

Ok,

Is there an option to strip or ignore these non-ascii characters ?

jaimeMF commented 8 years ago

The problem is that on python 2.x xml.etree.ElementTree uses str instead of unicode (like python 3.x) for the attribute values of the xml nodes. The issue is also present in other extractors (like niconico) and other fields are also not unicode (username, title ...). Probably the simplest fix is to make --write-description accept non unicode values, but ideally we should make sure that all fields in the info_dict are unicode.

dstftw commented 8 years ago

What if we just add a recursive auto decoding for all bytestrings in info_dict in YoutubeDL.process_video_result?

yan12125 commented 8 years ago

I prefer what @jaimeMF said, that is, requiring all string fields in info_dict be unicode. Changing things in extractors is simpler and less prone to bugs. I guess XML issues can be resolved in compat.py or util.py.

By the way, for VeohIE, seems v.* videos are handled via XML and yapi-.* videos are delegated to YoutubeIE. The JSON part is never reached, is it?

@sebma As of current the simplest workaround is running youtube-dl with Python 3. For example:

python3 /path/to/youtube-dl --verbose --ignore-config --write-description "http://www.veoh.com/watch/v46093745wbEGkakh"
dstftw commented 8 years ago

Conventionally we require it, but don't check it anywhere. Fixing this in extractors will involve fixing almost every extractor that uses xml.etree.ElementTree directly.

jaimeMF commented 8 years ago

In #7296 I have wrapped every call to xml.etree.ElementTree.fromstring with compat_etree_fromstring, which converts to unicode object the attributes.

jaimeMF commented 8 years ago

By the way, for VeohIE, seems v.* videos are handled via XML and yapi-.* videos are delegated to YoutubeIE. The JSON part is never reached, is it?

I don't know if there are more video types, since @dstftw made b540697a8a10bc742d6a241881196429eadd63ca he may know better.

gaming-hacker commented 8 years ago

why can't you force UTF-8/16? when grabbing the file? this is what i use with wget to get rid of non iso-8859-1 characters.

yan12125 commented 8 years ago

Closing since #7296 landed. This functionality will work on both Python 2 and Python 3 in the next version.