ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.32k stars 9.95k forks source link

ard UnicodeDecodeError #15023

Closed mcspa closed 6 years ago

mcspa commented 6 years ago

Please follow the guide below


Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2017.12.14. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

Before submitting an issue make sure you have:

What is the purpose of your issue?


The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue


If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add the -v flag to your command line you run youtube-dl with (youtube-dl -v <your command line>), copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'http://www.ardmediathek.de/tv/Tatort/Narben-H?rfassung/Das-Erste/Video?bcastId=602916&documentId=48459644']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.12.14
[debug] Python version 2.7.6 - Linux-4.4.0-101-generic-x86_64-with-Ubuntu-14.04-trusty
[debug] exe versions: avconv 9.20-6, avprobe 9.20-6, rtmpdump 2.4
[debug] Proxy map: {}
[ARD:mediathek] 48459644: Downloading webpage
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/bin/youtube-dl/__main__.py", line 19, in <module>
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 465, in main
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 455, in _real_main
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1986, in download
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 784, in extract_info
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 437, in extract
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/ard.py", line 180, in _real_extract
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 630, in _download_webpage
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 535, in _download_webpage_handle
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 506, in _request_webpage
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2196, in urlopen
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/local/bin/youtube-dl/youtube_dl/utils.py", line 1008, in http_response
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 45: invalid start byte
...
<end of log>

Description of your issue, suggested solution and other information

Explanation of your issue in arbitrary form goes here. Please make sure the description is worded well enough to be understood. Provide as much context and examples as possible. If work on your issue requires account credentials please provide them or explain how one can obtain them.

sleske commented 6 years ago

I just tested the URL you gave, and it works correctly for me. Note that the right URL is http://www.ardmediathek.de/tv/Tatort/Narben-H%C3%B6rfassung/Das-Erste/Video?bcastId=602916&documentId=48459644 - it contains a non-ASCII character, namely the ö (o umlaut).

Please double-check that you did not somehow mess up the URL when you copy-pasted it. From your logs, it looks like the URL is wrong.

At least on my system (Linux, with Firefox, or Google Chrome), if I copy/paste a URL with non-ASCII characters like this from the browser adress bar, I get the percent-encoded version (as you can see above).

Please try this:

Without more feedback, this ticket can only be closed as unreproducible, I'm afraid.

mcspa commented 6 years ago

You are right. The encoding went wrong. Thanks for your help.