ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.89k stars 10k forks source link

[Youtube] HTTP Error 404: Not Found (caused by HTTPError()) on Python 2 #8918

Open ewnd9 opened 8 years ago

ewnd9 commented 8 years ago
$ youtube-dl "https://www.youtube.com/watch?v=UdO7_GrRttM" --verbose

[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'https://www.youtube.com/watch?v=UdO7_GrRttM', u'--verbose']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.03.18
[debug] Python version 2.7.6 - Linux-3.16.0-60-generic-x86_64-with-Ubuntu-14.04-trusty
[debug] exe versions: avconv 9.18-6, avprobe 9.18-6, rtmpdump 2.4
[debug] Proxy map: {}
[youtube] UdO7_GrRttM: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 404: Not Found (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 365, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1929, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

Anything else I should attach?

rg3 commented 8 years ago

It works for me here. Can you check it was not a transient error on YouTube or your ISP?

ewnd9 commented 8 years ago

Could you elaborate, please (e.g. command to run in terminal to check this)? If you mean is there any errors regardless internet connection itself, everything is working okay except youtube-dl

ewnd9 commented 8 years ago

Headers from accessing url by curl

$ curl -s -D - "https://www.youtube.com/watch?v=UdO7_GrRttM" -o /dev/null
HTTP/1.1 200 OK
X-Frame-Options: SAMEORIGIN
Expires: Tue, 27 Apr 1971 19:44:06 EST
X-Content-Type-Options: nosniff
P3P: CP="This is not a P3P policy! See http://support.google.com/accounts/answer/151657?hl=ru for more info."
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
X-XSS-Protection: 1; mode=block; report=https://www.google.com/appserve/security-bugs/log/youtube
Date: Mon, 21 Mar 2016 13:42:11 GMT
Server: Ytfe_Worker
Set-Cookie: YSC=KLHg8PeOytk; path=/; domain=.youtube.com; httponly
Set-Cookie: PREF=f1=50000000; path=/; domain=.youtube.com; expires=Sun, 20-Nov-2016 01:35:11 GMT
Set-Cookie: VISITOR_INFO1_LIVE=Ofpd47IQUwM; path=/; domain=.youtube.com; expires=Sun, 20-Nov-2016 01:35:11 GMT; httponly
Alternate-Protocol: 443:quic,p=1
Alt-Svc: quic=":443"; ma=2592000; v="31,30,29,28,27,26,25"
Accept-Ranges: none
Vary: Accept-Encoding
Transfer-Encoding: chunked

I've remembered that around a month ago youtu.be domain was blocked by Russian government but it is available now. Are there any non-standard domains involved which I could check? (not youtube.com and youtu.be

rg3 commented 8 years ago

What I meant is that the video downloads fine here. You should probably wait for another team member to help you but, in the mean time, can you post here the program output if you add --dump-pages to the command you run? Thanks.

ewnd9 commented 8 years ago
$ youtube-dl "http://www.youtube.com/watch?v=UdO7_GrRttM" --verbose --dump-pages 
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'http://www.youtube.com/watch?v=UdO7_GrRttM', u'--verbose', u'--dump-pages']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.03.18
[debug] Python version 2.7.6 - Linux-3.16.0-60-generic-x86_64-with-Ubuntu-14.04-trusty
[debug] exe versions: avconv 9.18-6, avprobe 9.18-6, rtmpdump 2.4
[debug] Proxy map: {}
[youtube] UdO7_GrRttM: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 404: Not Found (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 365, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1929, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

I've changed url from https to http to exclude assumption of ssl problems, commands from my previous messages show same results with http url.

dstftw commented 8 years ago

Post the output of youtube-dl --print-traffic http://www.youtube.com/watch?v=UdO7_GrRttM.

ewnd9 commented 8 years ago
$ youtube-dl --print-traffic http://www.youtube.com/watch?v=UdO7_GrRttM  
[youtube] UdO7_GrRttM: Downloading webpage
send: u'GET /watch?v=UdO7_GrRttM&gl=US&hl=en&has_verified=1&bpctr=9999999999 HTTP/1.1\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nConnection: close\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/44.0 (Chrome)\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nHost: www.youtube.com\r\nCookie: PREF=f1=50000000&hl=en\r\n\r\n'
reply: 'HTTP/1.1 404 Not Found\r\n'
header: Date: Mon, 21 Mar 2016 16:37:36 GMT
header: Server: Apache/2.2.22
header: Last-Modified: Mon, 01 Feb 2016 08:25:06 GMT
header: ETag: "4080d-72a-52ab11f519717;52ab116f5e177"
header: Accept-Ranges: bytes
header: Vary: Accept-Encoding
header: Content-Encoding: gzip
header: Content-Length: 1100
header: Connection: close
header: Content-Type: text/html
ERROR: Unable to download webpage: HTTP Error 404: Not Found (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
dstftw commented 8 years ago

What's the output of curl 'http://www.youtube.com/watch?v=UdO7_GrRttM&gl=US&hl=en&has_verified=1&bpctr=9999999999' and curl -A 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/44.0 (Chrome)' 'http://www.youtube.com/watch?v=UdO7_GrRttM&gl=US&hl=en&has_verified=1&bpctr=9999999999'?

dstftw commented 8 years ago

Can you watch it in browser at all?

ewnd9 commented 8 years ago

curl 'http://www.youtube.com/watch?v=UdO7_GrRttM&gl=US&hl=en&has_verified=1&bpctr=9999999999'

http://pastebin.com/ui077RJx

curl -A 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/44.0 (Chrome)' 'http://www.youtube.com/watch?v=UdO7_GrRttM&gl=US&hl=en&has_verified=1&bpctr=9999999999'

http://pastebin.com/tgYFjKs1

Can you watch it in browser at all?

Yes

ewnd9 commented 8 years ago

Made some code experiments

import urllib2
response = urllib2.urlopen('https://www.youtube.com/watch?v=UdO7_GrRttM')
html = response.read()
print(html)
Traceback (most recent call last):
  File "test.py", line 2, in <module>
    response = urllib2.urlopen('https://www.youtube.com/watch?v=UdO7_GrRttM')
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
import urllib2
response = urllib2.urlopen('https://www.youtube.com/')
html = response.read()
print(html)
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Ссылка заблокирована</title>
<style type="text/css">
<!--
H5 {font-size:15px; font-weight:normal; letter-spacing:1px; text-transform:uppercase; white-space:nowrap; margin-bottom:14px; font-weight:bold;}
BODY {color:#4d4d9f; font-size:13px; background: #FFFFFF;  width: 99%;}
DIV.container {width:50%;margin-top:20px; margin:100px auto; background:#FFFFFF; border:3px solid #E42319; padding:10px; -moz-border-radius:20px; -khtml-border-radius: 20px; -webkit-border-radius: 20px; border-radius:20px;}
-->
</style>
</head>
<body>
<div class="container">
<center><img onerror="this.style.display='none';" src="http://img.ytapi.com/ttk.png">
<h5>Ссылка заблокирована!!!<br>в соответствии с законодательством РФ<br>Причина блокировки: <a href="http://blocklist.rkn.gov.ru/">blocklist.rkn.gov.ru</a></h5><br><br><a href="http://ttk.ru/">ЗАО "Компания ТрансТелеКом"</a>
</center></div>
<script type="text/javascript"> (function (d, w, c) { (w[c] = w[c] || []).push(function() { try { w.yaCounter33269963 = new Ya.Metrika({ id:33269963, clickmap:true, trackLinks:true, accurateTrackBounce:true }); } catch(e) { } }); var n = d.getElementsByTagName("script")[0], s = d.createElement("script"), f = function () { n.parentNode.insertBefore(s, n); }; s.type = "text/javascript"; s.async = true; s.src = "https://mc.yandex.ru/metrika/watch.js"; if (w.opera == "[object Opera]") { d.addEventListener("DOMContentLoaded", f, false); } else { f(); } })(document, window, "yandex_metrika_callbacks");</script><noscript><div><img src="https://mc.yandex.ru/watch/33269963" style="position:absolute; left:-9999px;" alt="" /></div></noscript>
</body></html>

It says that url is blocked by the government (so now 404 makes sense), but somehow google-chrome, curl and node.js are displaying youtube itself (http://youtube.com/).

Any ideas how could python's urllib2 use cached dns?

ewnd9 commented 8 years ago

python 3 is working ok

import urllib.request

url = 'https://www.youtube.com/'
request = urllib.request.urlopen(url)

print(request.read())

and, when I set python3 as the default python in the system ($ sudo mv /usr/bin/python3 /usr/bin/python), youtube-dl started working too

$ youtube-dl "https://www.youtube.com/watch?v=UdO7_GrRttM" --verbose
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['https://www.youtube.com/watch?v=UdO7_GrRttM', '--verbose']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.03.18
[debug] Python version 3.4.3 - Linux-3.16.0-60-generic-x86_64-with-Ubuntu-14.04-trusty
[debug] exe versions: avconv 9.18-6, avprobe 9.18-6, rtmpdump 2.4
[debug] Proxy map: {}
[youtube] UdO7_GrRttM: Downloading webpage
[youtube] UdO7_GrRttM: Downloading video info webpage
[youtube] UdO7_GrRttM: Extracting video information
[youtube] UdO7_GrRttM: Downloading MPD manifest
WARNING: Your copy of avconv is outdated and unable to properly mux separate video and audio files, youtube-dl will download single file media. Update avconv to version 10-0 or newer to fix this.
[debug] Invoking downloader on 'https://r7---sn-ug5onuxaxjvh-n8vz.googlevideo.com/videoplayback?mime=video%2Fmp4&key=yt6&itag=22&ipbits=0&sver=3&lmt=1458046689112654&signature=13AA167332E9E50715862938C14AFFCC16C267CB.60E23084B322E25A920E4748ABE4B678C5DAEE2C&mm=31&source=youtube&mn=sn-ug5onuxaxjvh-n8vz&ratebypass=yes&dur=69.752&mt=1458588690&mv=m&ms=au&fexp=9405984%2C9407191%2C9413142%2C9416126%2C9417828%2C9419817%2C9420452%2C9422596%2C9423661%2C9423662%2C9427902%2C9428422%2C9429160%2C9429808%2C9431012%2C9431270%2C9431400%2C9431619%2C9432057%2C9432437&ip=62.33.207.199&id=o-AIYuUW13dt4a3nKRAj_XOpEiexMAKv-zDwgl7trvxT2Y&sparams=dur%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Cratebypass%2Crequiressl%2Csource%2Cupn%2Cexpire&initcwndbps=3792500&expire=1458610418&upn=dmr9Dx8UCm0&pl=21&requiressl=yes'
[download] Destination: Unity Launcher And Dash Can Now Be Moved To The Bottom Of The Screen [Ubuntu 16.04 Xenial Xerus]-UdO7_GrRttM.mp4
[download] 100% of 4.85MiB in 00:00
abhigenie92 commented 7 years ago

Similar issue.

youtube-dl --verbose "https://www.youtube.com/watch\?v\=onBYsen2_eA"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--verbose', u'https://www.youtube.com/watch\\?v\\=onBYsen2_eA']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.06.25
[debug] Python version 2.7.12+ - Linux-4.8.0-22-generic-x86_64-with-Ubuntu-16.10-yakkety
[debug] exe versions: ffmpeg 3.0.2-1ubuntu3, ffprobe 3.0.2-1ubuntu3, rtmpdump 2.4
[debug] Proxy map: {}
[generic] watch\?v\=onBYsen2_eA: Requesting header
WARNING: Could not send HEAD request to https://www.youtube.com/watch\?v\=onBYsen2_eA: HTTP Error 404: Not Found
[generic] watch\?v\=onBYsen2_eA: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 404: Not Found (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/common.py", line 390, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 1950, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

 ✘  ~/Desktop/Lectures/CS/AI/monte_claro  apt list | grep youtube

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libwebservice-youtube-perl/yakkety,yakkety 1.0.3-3 all
libwww-youtube-download-perl/yakkety,yakkety 0.59-1 all
mopidy-youtube/yakkety,yakkety 2.0.0-2 all
mps-youtube/yakkety,yakkety 0.2.7.1-1 all
nuvolaplayer3-youtube/now 1.2-1~xenial amd64 [installed,local]
python-sphinxcontrib.youtube/yakkety,yakkety 1.0-1 all
python3-sphinxcontrib.youtube/yakkety,yakkety 1.0-1 all
unity-webapps-youtube/yakkety,yakkety 2.4.16+16.04.20151119-0ubuntu1 all
youtube-dl/yakkety,yakkety,now 2016.06.25-2 all [installed]
 ~/Desktop/Lectures/CS/AI/monte_claro  
yan12125 commented 7 years ago

@abhigenie92 There's no need to escape special characters in quotation marks. The following command should work fine:

youtube-dl --verbose "https://www.youtube.com/watch?v=onBYsen2_eA"
rkfg commented 7 years ago

@ewnd9 thank you for the python3 trick. It worked for me for some reason. That's a Russian censorship related issue but I don't understand the mechanism of urllib2 failing. I can browse YouTube perfectly fine with browser but at some point youtube-dl started failing. In Ubuntu/Debian it's sufficient to uninstall youtube-dl and then install pip3 with apt-get and do pip3 install youtube-dl

ewnd9 commented 7 years ago

@rkfg As I see it now, urlib2 is either caching by itself or using a system DNS cache mechanism. Cache was set when youtube was blocked, so it still loads block-page from ISP.

I believe you can simply run youtube-dl binary with python3 like $ python3 /usr/bin/youtube-dl instead of reinstalling

rkfg commented 7 years ago

I believe you can simply run youtube-dl binary with python3 like $ python3 /usr/bin/youtube-dl instead of reinstalling

Nope, tried it right after I found this issue here. The libraries are installed to /usr/local/lib/python2.7/dist-packages/ so they're not found by python3. Reinstall did the trick just fine.

ewnd9 commented 7 years ago

Thanks for the clarification :+1: