ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.44k stars 9.96k forks source link

UnicodeDecodeError: 'utf8' codec can't decode byte 0xb6 in position 147: invalid start byte #8070

Closed matiasw closed 8 years ago

matiasw commented 8 years ago

Thanks for helping create youtube-dl.

There seems to be a problem with Unicode handling. With youtube-dl [any video], OR trying the latest git master HEAD (d5f6429de87da4bffa0be7703d774681393f1ffb) and ./setup.py build, I get:

Traceback (most recent call last): File "/usr/local/bin/youtube-dl", line 5, in from pkg_resources import load_entry_point File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3138, in @_call_aside File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3124, in _call_aside f(_args, *_kwargs) File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3151, in _initialize_master_working_set working_set = WorkingSet._build_master() File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 652, in _build_master ws = cls() File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 645, in init self.add_entry(entry) File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 701, in add_entry for dist in find_distributions(entry, True): File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2139, in find_on_path path_item, entry, metadata, precedence=DEVELOP_DIST File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2521, in from_location py_version=py_version, platform=platform, kw File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2835, in _reload_version md_version = _version_from_file(self._get_metadata(self.PKG_INFO)) File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2486, in _version_from_file line = next(iter(version_lines), '') File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2654, in _get_metadata for line in self.get_metadata_lines(name): File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2030, in get_metadata_lines return yield_lines(self.get_metadata(name)) File "/usr/lib/python2.7/dist-packages/pkg_resources/init**.py", line 2025, in get_metadata metadata = f.read() File "/usr/lib/python2.7/codecs.py", line 314, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xb6 in position 147: invalid start byte

The line in question is: from pkg_resources import load_entry_point

The installed youtube-dl version (/usr/local/bin/youtube-dl) is 2015.12.18. Now, to my understanding, 0xb6 is, indeed, invalid as a utf8 start byte. Where this is coming from is beyond me at the moment. As the second byte, it is present in eg. ¶ (U+00B6, aka 0xc2 0xb6) and ö (U+00F6, aka 0xc3 0xb6). So far, I've found this presentation: http://nedbatchelder.com/text/unipain.html And this poor fellow, with the ~samish problem: http://www.gossamer-threads.com/lists/engine?do=post_view_flat;post=1070061;page=1;mh=-1;list=python;sb=post_latest_reply;so=ASC

yan12125 commented 8 years ago

The installed youtube-dl version (/usr/local/bin/youtube-dl) is 2015.12.18.

FYI: The latest version of youtube-dl is 2015.12.29. You may want to check whether there are multiple versions on your device.

yan12125 commented 8 years ago

Can you paste the output of the following command?

xxd /usr/lib/python2.7/dist-packages/youtube_dl-2015.12.29-py2.7.egg/EGG-INFO/PKG-INFO

The actual path may be different.

matiasw commented 8 years ago

Ok, there was indeed another version installed on my machine, 2015.11.27.1-1. I apt removed that, but, being unable to build the current version, was left to install v. 2015.11.27.1 from the repositories again. Here's the xxd output of /usr/lib/python2.7/dist-packages/youtube_dl-2015.11.27.1.egg-info/PKG-INFO: 00000000: 4d65 7461 6461 7461 2d56 6572 7369 6f6e Metadata-Version 00000010: 3a20 312e 310a 4e61 6d65 3a20 796f 7574 : 1.1.Name: yout 00000020: 7562 652d 646c 0a56 6572 7369 6f6e 3a20 ube-dl.Version: 00000030: 3230 3135 2e31 312e 3237 2e31 0a53 756d 2015.11.27.1.Sum 00000040: 6d61 7279 3a20 596f 7554 7562 6520 7669 mary: YouTube vi 00000050: 6465 6f20 646f 776e 6c6f 6164 6572 0a48 deo downloader.H 00000060: 6f6d 652d 7061 6765 3a20 6874 7470 733a ome-page: https: 00000070: 2f2f 6769 7468 7562 2e63 6f6d 2f72 6733 //github.com/rg3 00000080: 2f79 6f75 7475 6265 2d64 6c0a 4175 7468 /youtube-dl.Auth 00000090: 6f72 3a20 5068 696c 6970 7020 4861 6765 or: Philipp Hage 000000a0: 6d65 6973 7465 720a 4175 7468 6f72 2d65 meister.Author-e 000000b0: 6d61 696c 3a20 7068 6968 6167 4070 6869 mail: phihag@phi 000000c0: 6861 672e 6465 0a4c 6963 656e 7365 3a20 hag.de.License: 000000d0: 554e 4b4e 4f57 4e0a 4465 7363 7269 7074 UNKNOWN.Descript 000000e0: 696f 6e3a 2053 6d61 6c6c 2063 6f6d 6d61 ion: Small comma 000000f0: 6e64 2d6c 696e 6520 7072 6f67 7261 6d20 nd-line program 00000100: 746f 2064 6f77 6e6c 6f61 6420 7669 6465 to download vide 00000110: 6f73 2066 726f 6d20 596f 7554 7562 652e os from YouTube. 00000120: 636f 6d20 616e 6420 6f74 6865 7220 7669 com and other vi 00000130: 6465 6f20 7369 7465 732e 0a50 6c61 7466 deo sites..Platf 00000140: 6f72 6d3a 2055 4e4b 4e4f 574e 0a43 6c61 orm: UNKNOWN.Cla 00000150: 7373 6966 6965 723a 2054 6f70 6963 203a ssifier: Topic : 00000160: 3a20 4d75 6c74 696d 6564 6961 203a 3a20 : Multimedia :: 00000170: 5669 6465 6f0a 436c 6173 7369 6669 6572 Video.Classifier 00000180: 3a20 4465 7665 6c6f 706d 656e 7420 5374 : Development St 00000190: 6174 7573 203a 3a20 3520 2d20 5072 6f64 atus :: 5 - Prod 000001a0: 7563 7469 6f6e 2f53 7461 626c 650a 436c uction/Stable.Cl 000001b0: 6173 7369 6669 6572 3a20 456e 7669 726f assifier: Enviro 000001c0: 6e6d 656e 7420 3a3a 2043 6f6e 736f 6c65 nment :: Console 000001d0: 0a43 6c61 7373 6966 6965 723a 204c 6963 .Classifier: Lic 000001e0: 656e 7365 203a 3a20 5075 626c 6963 2044 ense :: Public D 000001f0: 6f6d 6169 6e0a 436c 6173 7369 6669 6572 omain.Classifier 00000200: 3a20 5072 6f67 7261 6d6d 696e 6720 4c61 : Programming La 00000210: 6e67 7561 6765 203a 3a20 5079 7468 6f6e nguage :: Python 00000220: 203a 3a20 322e 360a 436c 6173 7369 6669 :: 2.6.Classifi 00000230: 6572 3a20 5072 6f67 7261 6d6d 696e 6720 er: Programming 00000240: 4c61 6e67 7561 6765 203a 3a20 5079 7468 Language :: Pyth 00000250: 6f6e 203a 3a20 322e 370a 436c 6173 7369 on :: 2.7.Classi 00000260: 6669 6572 3a20 5072 6f67 7261 6d6d 696e fier: Programmin 00000270: 6720 4c61 6e67 7561 6765 203a 3a20 5079 g Language :: Py 00000280: 7468 6f6e 203a 3a20 330a 436c 6173 7369 thon :: 3.Classi 00000290: 6669 6572 3a20 5072 6f67 7261 6d6d 696e fier: Programmin 000002a0: 6720 4c61 6e67 7561 6765 203a 3a20 5079 g Language :: Py 000002b0: 7468 6f6e 203a 3a20 332e 320a 436c 6173 thon :: 3.2.Clas 000002c0: 7369 6669 6572 3a20 5072 6f67 7261 6d6d sifier: Programm 000002d0: 696e 6720 4c61 6e67 7561 6765 203a 3a20 ing Language :: 000002e0: 5079 7468 6f6e 203a 3a20 332e 330a 436c Python :: 3.3.Cl 000002f0: 6173 7369 6669 6572 3a20 5072 6f67 7261 assifier: Progra 00000300: 6d6d 696e 6720 4c61 6e67 7561 6765 203a mming Language : 00000310: 3a20 5079 7468 6f6e 203a 3a20 332e 340a : Python :: 3.4.

yan12125 commented 8 years ago

What's the output if you remove all existing versions and build and run the latest version? And the content of /usr/lib/python2.7/dist-packages/youtube_dl-2015.12.29-py2.7.egg/EGG-INFO/PKG-INFO after you've built and installed the latest version.

matiasw commented 8 years ago

Sorry, but I am unable to build the latest version. I get the same error as above. Personally, I find this strange.

yan12125 commented 8 years ago

Could you provide more details about "unable to build the latest version"? Including the commands you've tried to build and the error messages.

matiasw commented 8 years ago

./setup.py build Traceback (most recent call last): File "./setup.py", line 11, in from setuptools import setup File "/usr/lib/python2.7/dist-packages/setuptools/init.py", line 12, in from setuptools.extension import Extension File "/usr/lib/python2.7/dist-packages/setuptools/extension.py", line 8, in from .dist import _get_unpatched File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 19, in import pkg_resources File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3138, in @_call_aside File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3124, in _call_aside f(_args, *_kwargs) File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3151, in _initialize_master_working_set working_set = WorkingSet._build_master() File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 652, in _build_master ws = cls() File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 645, in init self.add_entry(entry) File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 701, in add_entry for dist in find_distributions(entry, True): File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2139, in find_on_path path_item, entry, metadata, precedence=DEVELOP_DIST File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2521, in from_location py_version=py_version, platform=platform, kw File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2835, in _reload_version md_version = _version_from_file(self._get_metadata(self.PKG_INFO)) File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2486, in _version_from_file line = next(iter(version_lines), '') File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2654, in _get_metadata for line in self.get_metadata_lines(name): File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2030, in get_metadata_lines return yield_lines(self.get_metadata(name)) File "/usr/lib/python2.7/dist-packages/pkg_resources/init**.py", line 2025, in get_metadata metadata = f.read() File "/usr/lib/python2.7/codecs.py", line 314, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xb6 in position 147: invalid start byte

yan12125 commented 8 years ago

In /usr/lib/python2.7/dist-packages/pkg_resources/__init__.py around line 2022:

    def get_metadata(self, name):
        if name=='PKG-INFO':
            with io.open(self.path, encoding='utf-8') as f:
                metadata = f.read()
            return metadata
        raise KeyError("No metadata except PKG-INFO is available")

Could you add a line print(self.path) before io.open(self.path, ...). Like this:

    def get_metadata(self, name):
        if name=='PKG-INFO':
            print(self.path)
            with io.open(self.path, encoding='utf-8') as f:
                metadata = f.read()
            return metadata
        raise KeyError("No metadata except PKG-INFO is available")

And run the build command again? WARNING: Changing system files is risky. Please backup all files you're going to change.

matiasw commented 8 years ago

./setup.py build Traceback (most recent call last): File "./setup.py", line 11, in from setuptools import setup File "/usr/lib/python2.7/dist-packages/setuptools/init.py", line 12, in from setuptools.extension import Extension File "/usr/lib/python2.7/dist-packages/setuptools/extension.py", line 8, in from .dist import _get_unpatched File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 19, in import pkg_resources File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2004, in class FileMetadata(EmptyProvider): File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2022, in FileMetadata print(self.path) NameError: name 'self' is not defined

yan12125 commented 8 years ago

Could you paste contents around line 2000~2050 of the original /usr/lib/python2.7/dist-packages/pkg_resources/__init__.py?

matiasw commented 8 years ago

register_loader_type(zipimport.zipimporter, ZipProvider)

class FileMetadata(EmptyProvider): """Metadata handler for standalone PKG-INFO files

Usage::

    metadata = FileMetadata("/path/to/PKG-INFO")

This provider rejects all data and metadata requests except for PKG-INFO,
which is treated as existing, and will be the contents of the file at
the provided location.
"""

def __init__(self, path):
    self.path = path

def has_metadata(self, name):
    return name=='PKG-INFO' and os.path.isfile(self.path)

def get_metadata(self, name):
    if name=='PKG-INFO':
        with io.open(self.path, encoding='utf-8') as f:
            metadata = f.read()
        return metadata
    raise KeyError("No metadata except PKG-INFO is available")

def get_metadata_lines(self, name):
    return yield_lines(self.get_metadata(name))

class PathMetadata(DefaultProvider): """Metadata provider for egg directories

Usage::

    # Development eggs:

    egg_info = "/path/to/PackageName.egg-info"
    base_dir = os.path.dirname(egg_info)
    metadata = PathMetadata(base_dir, egg_info)
    dist_name = os.path.splitext(os.path.basename(egg_info))[0]
    dist = Distribution(basedir, project_name=dist_name, metadata=metadata)

    # Unpacked egg directories:

    egg_path = "/path/to/PackageName-ver-pyver-etc.egg"
    metadata = PathMetadata(egg_path, os.path.join(egg_path,'EGG-INFO'))
    dist = Distribution.from_filename(egg_path, metadata=metadata)
"""
yan12125 commented 8 years ago

Seems you've added it to the wrong line. Originally it's

def get_metadata(self, name):
    if name=='PKG-INFO':
        with io.open(self.path, encoding='utf-8') as f:
            metadata = f.read()
        return metadata
    raise KeyError("No metadata except PKG-INFO is available")

Change it to:

def get_metadata(self, name):
    if name=='PKG-INFO':
        print(self.path)
        with io.open(self.path, encoding='utf-8') as f:
            metadata = f.read()
        return metadata
    raise KeyError("No metadata except PKG-INFO is available")

Note that the print line is exactly before with io.open(...

matiasw commented 8 years ago

Oops, sorry. Here's the printout with that line: ./setup.py build /usr/lib/python2.7/argparse.egg-info /usr/lib/python2.7/wsgiref.egg-info /usr/lib/python2.7/lib-dynload/Python-2.7.egg-info /usr/local/lib/python2.7/dist-packages/youtube_upload-0.8.0.egg-info /usr/local/lib/python2.7/dist-packages/escpos-1.0.7.egg-info /usr/local/lib/python2.7/dist-packages/calendar_indicator-0.3.1.egg-info /usr/lib/python2.7/dist-packages/pyxdg-0.25.egg-info /usr/lib/python2.7/dist-packages/yum_metadata_parser-1.1.4.egg-info /usr/lib/python2.7/dist-packages/mercurial-3.5.2.egg-info /usr/lib/python2.7/dist-packages/vboxapi-1.0.egg-info /usr/lib/python2.7/dist-packages/wxPython_common-3.0.2.0.egg-info /usr/lib/python2.7/dist-packages/bzr-2.7.0dev1.egg-info /usr/lib/python2.7/dist-packages/pycurl-7.19.5.3.egg-info /usr/lib/python2.7/dist-packages/lhafile-0.1.0fs4.egg-info /usr/lib/python2.7/dist-packages/arandr-0.1.8.egg-info /usr/lib/python2.7/dist-packages/rpm_python-4.12.0.1.egg-info /usr/lib/python2.7/dist-packages/python_debianbts-2.6.0.egg-info /usr/lib/python2.7/dist-packages/qbzr-0.23.1.egg-info /usr/lib/python2.7/dist-packages/SecretStorage-2.1.2.egg-info /usr/lib/python2.7/dist-packages/ecdsa-0.13.egg-info /usr/lib/python2.7/dist-packages/docutils-0.12.egg-info /usr/lib/python2.7/dist-packages/simplejson-3.7.3.egg-info /usr/lib/python2.7/dist-packages/python_apt-1.1.0.b1.egg-info /usr/lib/python2.7/dist-packages/scapy-2.2.0.egg-info /usr/lib/python2.7/dist-packages/httplib2-0.9.1.egg-info /usr/lib/python2.7/dist-packages/BzrTools-2.6.0.egg-info /usr/lib/python2.7/dist-packages/gdata-2.0.18.egg-info /usr/lib/python2.7/dist-packages/zenmap-7.00.egg-info /usr/lib/python2.7/dist-packages/pycrypto-2.6.1.egg-info /usr/lib/python2.7/dist-packages/pygobject-3.18.2.egg-info /usr/lib/python2.7/dist-packages/pysqlite-1.0.1.egg-info /usr/lib/python2.7/dist-packages/pyrit-0.4.0.egg-info /usr/lib/python2.7/dist-packages/python_xlib-0.14.egg-info /usr/lib/python2.7/dist-packages/numpy-1.9.2.egg-info /usr/lib/python2.7/dist-packages/apt_xapian_index-0.47.egg-info /usr/lib/python2.7/dist-packages/pygpgme-0.3.egg-info /usr/lib/python2.7/dist-packages/pygame-1.9.1release.egg-info /usr/lib/python2.7/dist-packages/bzr_builddeb-2.8.6.egg-info /usr/lib/python2.7/dist-packages/roman-2.0.0.egg-info /usr/lib/pymodules/python2.7/rpl-1.5.5.egg-info Traceback (most recent call last): File "./setup.py", line 11, in from setuptools import setup File "/usr/lib/python2.7/dist-packages/setuptools/init.py", line 12, in from setuptools.extension import Extension File "/usr/lib/python2.7/dist-packages/setuptools/extension.py", line 8, in from .dist import _get_unpatched File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 19, in import pkg_resources File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3139, in @_call_aside File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3125, in _call_aside f(_args, *_kwargs) File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3152, in _initialize_master_working_set working_set = WorkingSet._build_master() File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 652, in _build_master ws = cls() File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 645, in init self.add_entry(entry) File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 701, in add_entry for dist in find_distributions(entry, True): File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2140, in find_on_path path_item, entry, metadata, precedence=DEVELOP_DIST File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2522, in from_location py_version=py_version, platform=platform, kw File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2836, in _reload_version md_version = _version_from_file(self._get_metadata(self.PKG_INFO)) File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2487, in _version_from_file line = next(iter(version_lines), '') File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2655, in _get_metadata for line in self.get_metadata_lines(name): File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2031, in get_metadata_lines return yield_lines(self.get_metadata(name)) File "/usr/lib/python2.7/dist-packages/pkg_resources/init**.py", line 2026, in get_metadata metadata = f.read() File "/usr/lib/python2.7/codecs.py", line 314, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xb6 in position 147: invalid start byte

yan12125 commented 8 years ago

Thanks for helping debugging. The problem is /usr/lib/pymodules/python2.7/rpl-1.5.5.egg-info but not youtube-dl. As I can see from http://sourceforge.net/projects/rpl/files/rpl/rpl-1.5.5/, the author of rpl uses non-UTF8 characters in its setup.py. You may want to uninstall it and contact its author. Also, don't forget to restore the original /usr/lib/python2.7/dist-packages/pkg_resources/__init__.py.