Open arabesc opened 2 years ago
I think it's the totally valid URL because the following command works:
# youtube-dl -v -F https://www.youtube.com/watch\?v\=b5Kbzgx1w9A
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '-F', 'https://www.youtube.com/watch?v=b5Kbzgx1w9A']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.9.9 (CPython) - FreeBSD-13.0-RELEASE-p4-amd64-64bit-ELF
[debug] exe versions: ffmpeg 4.4.1, ffprobe 4.4.1, rtmpdump 2.4
[debug] Proxy map: {}
[youtube] b5Kbzgx1w9A: Downloading webpage
[info] Available formats for b5Kbzgx1w9A:
format code extension resolution note
249 webm audio only tiny 48k , webm_dash container, opus @ 48k (48000Hz), 22.40MiB
250 webm audio only tiny 56k , webm_dash container, opus @ 56k (48000Hz), 26.31MiB
251 webm audio only tiny 101k , webm_dash container, opus @101k (48000Hz), 46.90MiB
140 m4a audio only tiny 129k , m4a_dash container, mp4a.40.2@129k (44100Hz), 60.05MiB
160 mp4 256x144 144p 15k , mp4_dash container, avc1.4d400c@ 15k, 24fps, video only, 7.12MiB
278 webm 256x144 144p 44k , webm_dash container, vp9@ 44k, 24fps, video only, 20.48MiB
133 mp4 426x240 240p 28k , mp4_dash container, avc1.4d4015@ 28k, 24fps, video only, 13.05MiB
242 webm 426x240 240p 46k , webm_dash container, vp9@ 46k, 24fps, video only, 21.72MiB
134 mp4 640x360 360p 47k , mp4_dash container, avc1.4d401e@ 47k, 24fps, video only, 21.91MiB
243 webm 640x360 360p 93k , webm_dash container, vp9@ 93k, 24fps, video only, 43.35MiB
135 mp4 854x480 480p 69k , mp4_dash container, avc1.4d401e@ 69k, 24fps, video only, 32.13MiB
244 webm 854x480 480p 147k , webm_dash container, vp9@ 147k, 24fps, video only, 68.64MiB
136 mp4 1280x720 720p 96k , mp4_dash container, avc1.4d401f@ 96k, 24fps, video only, 44.80MiB
247 webm 1280x720 720p 258k , webm_dash container, vp9@ 258k, 24fps, video only, 120.03MiB
137 mp4 1920x1080 1080p 348k , mp4_dash container, avc1.640028@ 348k, 24fps, video only, 161.63MiB
248 webm 1920x1080 1080p 491k , webm_dash container, vp9@ 491k, 24fps, video only, 227.86MiB
18 mp4 640x360 360p 276k , avc1.42001E, 24fps, mp4a.40.2 (44100Hz), 128.02MiB
22 mp4 1280x720 720p 225k , avc1.64001F, 24fps, mp4a.40.2 (44100Hz) (best)
By the time it's been passed into yt-dl, it is a valid URL, though: \=
-> =
.
The error is so weird that it's never been seen by Google.
I would guess that something is screwing with your network.
No, it doesn't. But there is no issue to play a video from the source URL in a browser.
As I noted above, there are no issues with youtube in a browser.
... something is screwing with your network.
Looks like something that is both common to curl and Python and not used by your browser.
When I play the video from a browser it uses another server address: https://rr6---sn-gvnuxaxjvh-n8vr.googlevideo.com/
Here is a simple test:
# curl -v https://rr6---sn-gvnuxaxjvh-c35z.googlevideo.com/
* Trying 77.37.252.145:443...
* Connected to rr6---sn-gvnuxaxjvh-c35z.googlevideo.com (77.37.252.145) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* CAfile: /usr/local/share/certs/ca-root-nss.crt
* CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (OUT), TLS alert, illegal parameter (559):
* error:141713E7:SSL routines:tls_process_server_hello:invalid session id
* Closing connection 0
curl: (35) error:141713E7:SSL routines:tls_process_server_hello:invalid session id
# curl -v https://rr6---sn-gvnuxaxjvh-n8vr.googlevideo.com/
* Trying 213.59.210.81:443...
* Connected to rr6---sn-gvnuxaxjvh-n8vr.googlevideo.com (213.59.210.81) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* CAfile: /usr/local/share/certs/ca-root-nss.crt
* CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: CN=*.googlevideo.com
* start date: Feb 8 03:36:08 2022 GMT
* expire date: Apr 19 03:36:07 2022 GMT
* subjectAltName: host "rr6---sn-gvnuxaxjvh-n8vr.googlevideo.com" matched cert's "*.googlevideo.com"
* issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1C3
* SSL certificate verify ok.
> GET / HTTP/1.1
> Host: rr6---sn-gvnuxaxjvh-n8vr.googlevideo.com
> User-Agent: curl/7.81.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< Date: Thu, 24 Feb 2022 23:19:38 GMT
< Content-Type: text/html; charset=UTF-8
< Server: gvs 1.0
< Content-Length: 1561
< X-XSS-Protection: 0
< X-Frame-Options: SAMEORIGIN
<
<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 404 (Not Found)!!1</title>
<style>
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
</style>
<a href=//www.google.com/><span id=logo aria-label=Google></span></a>
<p><b>404.</b> <ins>That’s an error.</ins>
<p>The requested URL <code>/</code> was not found on this server. <ins>That’s all we know.</ins>
* Connection #0 to host rr6---sn-gvnuxaxjvh-n8vr.googlevideo.com left intact
The question is, why did the browser choose the rr6---sn-gvnuxaxjvh-n8vr.googlevideo.com server but youtube-dl choose rr6---sn-gvnuxaxjvh-c35z.googlevideo.com ?
@89z
If youre able to install Go, you can try this program
I'm not familiar with Go, the test sample doesn't compile, there is following output:
no required module provides package github.com/89z/mech/youtube: go.mod file not found in current directory or any parent directory; see 'go help modules'
I think its random. Replaying the exact request, I get different origins
Yep, I have the same in the browser, but youtube-dl is sticked to the rr6---sn-gvnuxaxjvh-c35z.googlevideo.com for some reason.
@89z
If youre able to install Go, you can try this program
# go run test.go
POST https://www.youtube.com/youtubei/v1/player
panic: Head "https://rr6---sn-gvnuxaxjvh-c35z.googlevideo.com/videoplayback?expire=1645768228&ei=xBkYYsfPLIm0vwSV8LeoBw&id=o-AD2vVazlQA5t-M0MjHygiEaVs1OoYXVIJgHTB8w_EEJv&itag=137&source=youtube&requiressl=yes&mh=6e&mm=31%2C29&mn=sn-gvnuxaxjvh-c35z%2Csn-gvnuxaxjvh-n8vr&ms=au%2Crdu&mv=m&mvi=6&pcm2cms=yes&pl=24&initcwndbps=723750&vprv=1&mime=video%2Fmp4&gir=yes&clen=169482542&dur=3890.386&lmt=1640040595008522&mt=1645746307&fvip=6&keepalive=yes&fexp=24001373%2C24007246&c=ANDROID&txp=5432434&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRQIhAJM21SZi5pRwbSGJ6zkhtFLL5FzqZPFd4N7m3VQjqXzvAiBuuGB3PggTSM_AiQRUUt867vX952c4qiNphi7u4QeBnA%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpcm2cms%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRAIgQChcoVrKaq1xpSy5l5HPprkxQuhdStKDgNMBmznR3-ACIDFD-Ag5HwOlPccTKsnrXJ_QBHsXdqYHvfuPrVahhVaY": net/http: TLS handshake timeout
goroutine 1 [running]:
main.main()
test.go:15 +0x171
exit status 2
The test sample uses the same rr6---sn-gvnuxaxjvh-c35z.googlevideo.com server that is unavaialble for me for some reason.
And I suppose the Go runtime uses, or clones, OpenSSL?
yt-dl selects the download link deterministically, I think, whereas the YT player JS may randomise its selection from the links that match the selected quality.
In the network exchange log I see that at first the browser tries to connect to the rr6---sn-gvnuxaxjvh-c35z.googlevideo.com server, it gets zero length response and then switches to another server that works.
At l.1898 of extractor/youtube.py
, we could add self._check_formats(formats)
before self._sort_formats(formats)
to do something similar (ie verifying https: access). Some other extractors whose target sites are flakier than YT do this.
The fallback server name is contained in the download URL:
https://rr6---sn-gvnuxaxjvh-c35z.googlevideo.com/videoplayback?expire=1645740445&ei=Pa0XYrr8MPKS1d8P66qs8AY&id=o-AJb_l8PiCAQrb2XMIHE6i6_9du6G7c5uQMAWFyPNon3l&itag=248&aitags=133%2C134%2C135%2C136%2C137%2C160%2C242%2C243%2C244%2C247%2C248%2C278&source=youtube&requiressl=yes&mh=6e&mm=31%2C29&mn=sn-gvnuxaxjvh-c35z%2Csn-n8v7kne6&ms=au%2Crdu&mv=m&mvi=6&pcm2cms=yes&pl=23&initcwndbps=707500&vprv=1&mime=video%2Fwebm&ns=e8mJx0XBp7jRzBMnM6sgYiYG&gir=yes&clen=238929645&dur=3890.386&lmt=1640041115174139&mt=1645718473&fvip=6&keepalive=yes&fexp=24001373%2C24007246&c=WEB&txp=5432434&n=WKsPPQKun4uNUbYH9&sparams=expire%2Cei%2Cip%2Cid%2Caitags%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRAIgOgShB-b6rJ6_mi2st6qwHScTl17tLL46KtJBUO3bLysCIFFPO7iraa7C52XH3aJUTxar0zI4qlTgWy6SuXnMszbx&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpcm2cms%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRgIhAMsBM-YtDgxFBmECITIrkvzMwQGBBnMqgQF208ZzHfw7AiEAsnEYaNBdmGkhqXw-SgfLSsrV0YHjkoPuHo7n1CyP_AM%3D
It's the second part of the mn
parameter: mn=sn-gvnuxaxjvh-c35z%2Csn-n8v7kne6
Here is the result:
# go run test.go
POST https://www.youtube.com/youtubei/v1/player
rr6---sn-gvnuxaxjvh-c35z.googlevideo.com Head "https://rr6---sn-gvnuxaxjvh-c35z.googlevideo.com/videoplayback?expire=1645772237&ei=bSkYYqbgD8CL6dsP9vqzsAY&id=o-AN7bjdYUY9etnMS-zZmmtLG3WePC_qo6KNk9Cr8WJUZs&itag=137&source=youtube&requiressl=yes&mh=6e&mm=31%2C29&mn=sn-gvnuxaxjvh-c35z%2Csn-gvnuxaxjvh-n8vr&ms=au%2Crdu&mv=m&mvi=6&pl=24&initcwndbps=776250&vprv=1&mime=video%2Fmp4&gir=yes&clen=169482542&dur=3890.386&lmt=1640040595008522&mt=1645750147&fvip=6&keepalive=yes&fexp=24001373%2C24007246&c=ANDROID&txp=5432434&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRQIgGCHhUYZ-7PBfd4qGYSOmnTZyCVB1F_ng3qFvaZPR-ncCIQCGvqNJ6jGkhl1jDT2EC-1i_uNXPZvPPqQ-hEIR-zUhOg%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIhAI09_nQIBgQla_uZgMIavOWvZUWGDCzZoZLGRj6qfHtsAiBXyJP4CpNd3T3IZxOPFWD8-aG7Bp_WVB0V6H4iGo_VXg%3D%3D": net/http: TLS handshake timeout
rr6---sn-gvnuxaxjvh-n8vr.googlevideo.com 200 OK
There is about 10 seconds delay between the first POST output and the rest.
Here's a patch against the git master that might be worth exercising:
--- old/youtube_dl/extractor/youtube.py
+++ new/youtube_dl/extractor/youtube.py
@@ -1539,6 +1539,22 @@
fmt['url'] = compat_urlparse.urlunparse(
parsed_fmt_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
+ def _get_fallback_formats(self, formats):
+ fb_formats = []
+ for fmt in formats:
+ parsed_fmt_url = compat_urlparse.urlparse(fmt['url'])
+ qs = compat_urlparse.parse_qs(parsed_fmt_url.query)
+ sub_host = qs.get('mn', [''])[-1].split(',')
+ if len(sub_host) != 2:
+ continue
+ alt_loc = parsed_fmt_url.netloc.replace(*sub_host)
+ if alt_loc != parsed_fmt_url.netloc:
+ fmt = fmt.copy()
+ fmt['url'] = compat_urlparse.urlunparse(
+ parsed_fmt_url._replace(netloc=alt_loc))
+ fb_formats.append(fmt)
+ return fb_formats
+
def _mark_watched(self, video_id, player_response):
playback_url = url_or_none(try_get(
player_response,
@@ -1895,6 +1911,9 @@
if reason:
raise ExtractorError(reason, expected=True)
+ formats.extend(self._get_fallback_formats(formats))
+
+ self._check_formats(formats, video_id)
self._sort_formats(formats)
keywords = video_details.get('keywords') or []
In my case this error was caused by an anti-DPI tool running in background.
Adding YouTube to the whitelist helped.
So, an example of "something is screwing with your network".
Checklist
Verbose log
Description
There is a weird error when trying to download a video. It probably may be an OpenSSL issue or an OpenSSL usage issue.
I've tried to use cURL with the direct download link from the youtube-dl output and got the same error:
It seems the server returns the empty answer for some reason.