Does youtube-dl not have the issue of format 22 being broken sometimes, like yt-dlp has?

xxxLCxxx commented 2 years ago

As you can see here: Youtube: Did not get any data blocks for format 22 #3372 yt-dlp suffers from YouTube's format 22 (most popular x264 mp4 in 720p) being broken (or rather, not ready) sometimes. Since I haven't seen the issue being discussed here, I wanted to ask if I missed the discussion or youtube-dl does handle this differently?

dirkf commented 2 years ago

Whatever YT sends is what you get. If you don't like format 22 (because it might not be reliable), filter it out in the -f ... option of your command.

Having said that, it might be reasonable to bias the format preferences such that 22 is automatically skipped in favour of any other format that matches the specified format selection criteria, similar to what I believe has been done in yt-dlp.

xxxLCxxx commented 2 years ago

It's not that format 22 is always broken. I'd say it's broken about 5% of the time for me. For others, it may not even be broken at all (it seems to be some kind of censorship, deprioritizing the encoding of certain “shit listed” videos). As you can see in my last post here: https://github.com/yt-dlp/yt-dlp/issues/3372 it is rather easy to test and detect when it is broken. Could the downloader run the test for format 22 and throw a certain exception if broken? This could be caught and then proceeded with the next best. Format 22 is preferable for many, as it is only about half the size of the second best in HD.

Fixing this from the outside, via shell scripting, slows down all other YouTube downloads significantly.

dirkf commented 2 years ago

The yt-dlp commit that I mentioned: https://github.com/yt-dlp/yt-dlp/commit/91e5e839d3017577dabba7e9b142910ec32a495a

xxxLCxxx commented 2 years ago

That's not a solution. It now always downloads the alternative, which is twice the size (and split). 22 is working most of the time. It is the preferred format for many; perhaps even the most popular.

The attempt to download a chunk in the middle for format 22 would only take a few milliseconds with an open connection. The connection could remain open (at least when it's not broken). There's no overhead.

dirkf commented 2 years ago

So:

find out the size x of the format 22 file should be -- should be possible
make a request for a piece of that file starting at x/2
check the HTTP code.

Is there any suggestion as to whether x/2, x/3, x might be the best place to check?

To implement that:

make a HEAD request for the format 22 URL
if the request fails, give up and don't de-prioritise the format
if the returned headers don't include Accept-Range with a value other than none, give up and don't de-prioritise the format
if the returned headers don't include 'Content-Length', give up and don't de-prioritise the format
set x to the value of the Content-Length header
make a GET request for the format 22 URL with header Range: bytes=a,b where a=x/2 and b=a+512 (say)
if the HTTP code is 400-499, de-prioritise the format and return.
otherwise, return and don't de-prioritise the format.

xxxLCxxx commented 2 years ago

I'm not familiar with how the processing of things is done, but couldn't this be inserted right into the YouTube download? Before the download starts, check if it's format 22. If it is, make a copy of the request, modify the starting offset to half the size and download only one chunk. It this succeeds continue with the original copy (original download). Otherwise, raise a special exception (youtube_format_22_broken or something like that). The exception is then caught before the format selection, format 22 gets removed and the request is restarted. Not having a clue on how things are being processed, this might be an asinine oversimplified view on things, however. ;-|

dirkf commented 2 years ago

Getting the formats, selecting a format (or formats), and downloading the format(s) are different stages of processing.

The download process might be done by external programs and can't be tweaked.

I made a quick implementation of the concept. You get format 18 ahead of 22 if you ask for --format mp4:

$ python -m youtube_dl --get-format -f mp4 --ignore-config 'https://www.youtube.com/watch?v=5Ud81mGj_fM'
WARNING: De-prioritising bad format 22
18 - 640x360 (360p)
$

However that video doesn't trigger the check now (I had to fake the result). We need a new video that demonstrably has the problem.

xxxLCxxx commented 2 years ago

I'm looking for one, but this proves that it actually works most of the time. lol

xxxLCxxx commented 2 years ago

I can't find anything currently. I'm guessing that YouTube's encoding servers are not nearly used at capacity on a Tuesday evening (here). Now that I think of it, there were some videos not working mostly around weekends. The best way to find them is to look for something like this Upload date: last hour, duration: over 20 minutes, features: HD, sort by: upload date on YouTube. It helps to look for “contested terms”, such as “Covid, Corona, Ukraine, protest, demonstration, live” and such. The longer the video is, the better the chances. One can also wait for a long live-stream to finish. At that point the conversion of format 22 should not be ready. This should work particularly well for videos that are 5, 6 hours long. I can't think of one on a Tuesday, though. I got a lot of “ERROR: … Requested format is not available”, even though I picked HD as filter.

xxxLCxxx commented 2 years ago

Maybe wait for this to finish? https://www.youtube.com/watch?v=F6T5EAj1g_4 VERDICT WATCH: Johnny Depp v Amber Heard Defamation Trial It says "Started streaming 7 hours ago" now. But I'm guessing that this will receive a high priority.

xxxLCxxx commented 2 years ago

Can you post the changes, please. I cloned both “youtube-dl” and “yt-dlp”. The latter I already ran from source. The next time that I stumble into a broken format 22, I can try your changes.

dirkf commented 2 years ago

This method added to class YoutubeIE (and import HEADRequest from ..utils):

    def _check_bad_format_url(self, video_id, fmt_url):
        size = None
        try:
            urlh = self._request_webpage(HEADRequest(fmt_url), video_id, 'Checking %s HEAD' % (fmt_url, ))
            if urlh.headers.get('Accept-Ranges', 'none') == 'none':
                return False
            size = int_or_none(urlh.headers.get('Content-Length'))
        except ExtractorError:
            pass

        if size is None:
            return False
        size = size / 2
        headers = {'Range': 'bytes=%d-%d' % (size, size + 512)}
        try:
            urlh = self._request_webpage(fmt_url, video_id, 'Checking %s GET' % (fmt_url, ), headers=headers)
        except ExtractorError as e:
            if isinstance(e.cause, compat_HTTPError):
                http_code = e.cause.code
                # any 400 code except 416 => block unavailable, bad
                return http_code >= 400 and http_code < 500 and http_code != 416
        return False

This fragment added in the _real_extract() method (about 160 lines in):

             dct = {
                 'asr': int_or_none(fmt.get('audioSampleRate')),
                 'filesize': int_or_none(fmt.get('contentLength')),
                 'format_id': itag,
                 'format_note': fmt.get('qualityLabel') or quality,
                 'fps': int_or_none(fmt.get('fps')),
                 'height': int_or_none(fmt.get('height')),
                 'quality': q(quality),
                 'tbr': tbr,
                 'url': fmt_url,
                 'width': fmt.get('width'),
             }
+            if itag == '22':
+                bad = self._check_bad_format_url(video_id, fmt_url)
+                if bad:
+                    self._downloader.report_warning('De-prioritising bad format %s' % (itag, ))
+                    dct['preference'] = -20

You could also add a check in the manifest processing later, but format 22 shouldn't be found there:

         if hls_manifest_url:
             for f in self._extract_m3u8_formats(
                     hls_manifest_url, video_id, 'mp4', fatal=False):
                 itag = self._search_regex(
                     r'/itag/(\d+)', f['url'], 'itag', default=None)
                 if itag:
                     f['format_id'] = itag
+                    if itag == '22' and self._check_bad_format_url(video_id, f['url']):
+                        self._downloader.report_warning('De-prioritising bad format %s' % (itag, ))
+                        f['preference'] = -20
                 formats.append(f)

dirkf commented 2 years ago

Hit one, using yt-dl with --flat-playlist ytsearchdate10:conspiracy:

{"duration": 2606.0, "_type": "url", "ie_key": "Youtube", "description": null, "uploader": "The Bulwark", "title": "The NRA Created Our Paranoid, Conspiracy Culture | Ryan Busse on The Bulwark Podcast", "url": "Asx1SsWHdmY", "view_count": 1184, "id": "Asx1SsWHdmY"}

Test:

$ python -m youtube_dl -v --get-format   -f mp4 --ignore-config 'https://www.youtube.com/watch?v=Asx1SsWHdmY'
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'--get-format', u'-f', u'mp4', u'--ignore-config', u'https://www.youtube.com/watch?v=Asx1SsWHdmY']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: d7bb9f462
[debug] Python version 2.7.17 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
WARNING: De-prioritising bad format 22
18 - 640x360 (360p)
$

For falling apart cabling reasons, my internet connection is currently probably slower than YT's media transcoders but you may be able to find the point where this video fails.

xxxLCxxx commented 2 years ago

yt-dlpold -f 22 'https://www.youtube.com/watch?v=Asx1SsWHdmY' [youtube] Asx1SsWHdmY: Downloading webpage [youtube] Asx1SsWHdmY: Downloading android player API JSON [info] Asx1SsWHdmY: Downloading 1 format(s): 22 [download] Destination: The NRA Created Our Paranoid, Conspiracy Culture Ryan Busse on The Bulwark Podcast [Asx1SsWHdmY].mp4 [download] 2.1% of 52.15MiB at 652.15KiB/s ETA 01:20[download] Got server HTTP error: Downloaded 1164920 bytes, expected 54687680 bytes. Retrying (attempt 1 of 10) ...

I doubt that I'll be able to work in the changes in time, though. I'm a bloody novice at that. ;-)

dirkf commented 2 years ago

Strangely I got to 15% with no errors.

Then trying 3 times using the last 512 bytes of the file (instead of a block in the middle, on the basis that if the problem is incomplete transcoding, the end is most likely to be bad), good, bad, bad.

Then with the test in the middle, bad, bad, good.

Hmmmm.

xxxLCxxx commented 2 years ago

That would suggest parallel encoding. But which idiot would do that in-place? Just what the hell are they doing?

xxxLCxxx commented 2 years ago

Could this be a distributed fs, with you getting from different servers? Again - why do that in-place?

dirkf commented 2 years ago

Almost certainly each request is potentially handled by a different CDN or data centre back-end server.

This guy seems to know about YT internals:

xxxLCxxx commented 2 years ago

The problem being that when the connection is closed between test and download, this could now be a different server with different status of the same file. Google is a real pita.

xxxLCxxx commented 2 years ago

They wouldn't abuse the same IP, though - or would they? If the IPs are different, then we could talk to the same server via IP.

dirkf commented 2 years ago

In another issue we determined that YT actually sends a backup URL for each download by means of query parameters in the URL:

https://rr2---sn-cu-aigsl.googlevideo.com/videoplayback?
    expire=1654115076&
    ei=pHaXYqyINY2I0wWjqLGgBA&
    ip=51.6.64.174&
    id=o-ABFKTnZv6lsb1y3AXB_T6Z2d07zgNtxV0yEmVud_Y6Cv&
    itag=22&
    source=youtube&
    requiressl=yes&
    mh=EV&
    mm=31%2C29&
    mn=sn-cu-aigsl%2Csn-cu-c9il&
    ms=au%2Crdu&
    mv=m&
    mvi=2&
    pl=26&
    ...

Parameter mn tells us to replace the part before %2C with the part after, so the backup URL is https://rr2---sn-cu-c9il.googlevideo.com/... This may be worth investigating.

dirkf commented 2 years ago

If the IPs are different, then we could talk to the same server via IP.

The IP may not be assigned to any server when it's used again. Remember the IPs probably belong to virtual servers that may be spun up when needed. Which actual server answers to (eg) rr2---sn-cu-aigsl.googlevideo.com may change as long as HTTP requests sent to that host give the "same" results.

xxxLCxxx commented 2 years ago

:-) That looks very familiar. I got such URLs from online downloading sites such as y2mate before I discovered youtube-dl.

dirkf commented 2 years ago

This may be worth investigating.

The backup gives 404 when the main host does. Possibly they are both looking at the same storage and relaying a file read error back as 404. If the backup doesn't have the same parameters, it gets 403, eg if the mn parameter is removed.

xxxLCxxx commented 2 years ago

If I recall this correctly, then those URLs returned from the online downloaders only lasted while you kept accessing them. If left alone for a while, they became invalid. That's what the expire value is there for, I think.

dirkf commented 2 years ago

Sure. For instance the one I quoted above expired yesterday:

$ date --date=@1654115076
Wed  1 Jun 21:24:36 BST 2022
$

Of course when you extract again you get a different media URL

Also, as expected, the test page that had the failing media link is now repeatably not triggering my test (using last 512 bytes).

xxxLCxxx commented 2 years ago

Ahh, the weekend! Found some:

yt-dlp_old -f 22 'https://www.youtube.com/watch?v=RmLfZbt_fKg' [youtube] RmLfZbt_fKg: Downloading webpage [youtube] RmLfZbt_fKg: Downloading android player API JSON [info] RmLfZbt_fKg: Downloading 1 format(s): 22 [download] Destination: International Wedding Foods Taste Test [RmLfZbt_fKg].mp4 [download] 0.6% of 72.47MiB at 217.96KiB/s ETA 05:38[download] Got server HTTP error: Downloaded 493187 bytes, expected 75987795 bytes. Retrying (attempt 1 of 10) ... ERROR: Did not get any data blocks

yt-dlp_old -f 22 'https://www.youtube.com/watch?v=SYxPzjV2g4Q' [youtube] SYxPzjV2g4Q: Downloading webpage [youtube] SYxPzjV2g4Q: Downloading android player API JSON [info] SYxPzjV2g4Q: Downloading 1 format(s): 22 [download] Destination: We Try Crazy TikTok Tricks [SYxPzjV2g4Q].mp4 [download] 0.7% of 60.49MiB at 211.42KiB/s ETA 04:50[download] Got server HTTP error: Downloaded 453979 bytes, expected 63427004 bytes. Retrying (attempt 1 of 10) ... ERROR: Did not get any data blocks

xxxLCxxx commented 2 years ago

I also want to point out their dates. Today is June 4th. One video is from May 30. The other is from June 2. Both are rather short even.

unixfox commented 2 years ago

In another issue we determined that YT actually sends a backup URL for each download by means of query parameters in the URL:
https://rr2---sn-cu-aigsl.googlevideo.com/videoplayback?[](https://rr2---sn-cu-aigsl.googlevideo.com/videoplayback?)
    expire=1654115076&
    ei=pHaXYqyINY2I0wWjqLGgBA&
    ip=51.6.64.174&
    id=o-ABFKTnZv6lsb1y3AXB_T6Z2d07zgNtxV0yEmVud_Y6Cv&
    itag=22&
    source=youtube&
    requiressl=yes&
    mh=EV&
    mm=31%2C29&
    mn=sn-cu-aigsl%2Csn-cu-c9il&
    ms=au%2Crdu&
    mv=m&
    mvi=2&
    pl=26&
    ...
Parameter mn tells us to replace the part before %2C with the part after, so the backup URL is https://rr2---sn-cu-c9il.googlevideo.com/... This may be worth investigating.

I did test the backup URL but that serves the same broken file.

xxxLCxxx commented 2 years ago

Yes, but the idea here is reliability in the sense that you test if it's broken and when it is not, you download using the same source and not another (possibly broken) server with a different replication status.

xxxLCxxx commented 2 years ago

I want to add that format 22 “is best” by a large margin. It's one of the reasons for downloading from YouTube instead of Rumble, Twitch, GETR and others. I often download and watch scientific debates and presentations. These are typically several hours long (sometimes more than 7). They show diagrams and such. Therefore, HD is the bare minimum. Such a video can easily be over 7 GB on the alternatives. The same video (encoded in better quality, even) in format 22 is usually less than 3 GB. This is a difference between night and day for me. In other words: the potential gain from format 22 (half the size of the alternative formats) outweighs possible downsides by a large margin. Even if youtube-dl/yt-dlp were to run into a broken format 22 and only afterwards pick an alternative, this would still be much faster in most scenarios. The vast majority of my broken format 22 downloads failed before reaching 1 %. Format 22 saves both time and bandwidth, by a large margin. Whichever “handstand” has to be performed in order to avoid broken 22 links or pick up after failed download attempts, is going to be easily worth the investment.

dirkf commented 2 years ago

My summary conclusions:

YT may serve different actual files for the same extracted download URL at any time within its expiration period, presumably because of the distributed caching architecture as linked earlier
so there's no way of knowing reliably at extraction time whether an extracted download URL will be valid at download time (testing the URL at extraction time shows that the URL could be problematic, but we could guess that anyway)
fundamentally this is a YT bug.

Some possible workarounds exist:

skip format 22, instead using some other lower resolution single format, but then you never get the higher resolution
skip format 22, instead using merged video+audio formats, but that will use more space and processor while downloading and post-processing, and the result may be bigger
ask for format 22, avoiding files that are too new (--match-filter 'upload_date <= 20220605 today, say: yt-dlp has a richer syntax that might be able to express 'older than 3 days' directly, IDK)
possibly ask for both 22 and a fall-back format, with --ignore-errors and a --exec ... command that renames (or links or copies) the f22 file to the fallback filename, such that a successful f22 download would skip the fallback download.

unixfox commented 2 years ago

Has anyone ever tried to contact Youtube about that?

For example, on iOS with safari, on the web interface www.youtube.com the broken videos won't load in 720P. Safari doesn't support DASH so it uses the non-DASH videos. I don't think youtube would want to serve broken experience to its users.

dirkf commented 2 years ago

It's the Web. People expect it to be broken (either through naïve experience or deep understanding (PDF, 2014, no less horrifyingly accurate).

xxxLCxxx commented 2 years ago

possibly ask for both 22 and a fall-back format, with --ignore-errors and a --exec ... command that renames (or links or copies) the f22 file to the fallback filename, such that a successful f22 download would skip the fallback download.

Could you give us mere mortals an example on how this could be done? :-)

dirkf commented 2 years ago

Maybe these hints (POSIX shell) ...

possibly ask for both 22 and a fall-back format

--f '(22,136+140)/mp4[format_id!=22]/best/bestvideo+bestaudio'

with --ignore-errors

--ignore-errors

and a --exec ... command that renames (or links or copies) the f22 file to the fallback filename, such that a successful f22 download would skip the fallback download.

To allow different formats:

-o '%(title)s-%(id)s-%(format_id)s.%(ext)s'

To link the downloaded file:

-exec 'path/to/link22 "{}"'

where path/to/link22 is an executable script like this

#!/bin/sh
for f; do
    ln "$f" "$(echo "$f"| sed 's/-22\./-136+140\./')"
done

xxxLCxxx commented 2 years ago

Given the filenames I encounter on a regular basis on YouTube this is bound to give you some surprises. ;-) You can fix things by escaping (and run into another set of surprises): printf -v f "%q" "${f}"

Just to recapitulate: In order to get a consistent link to a YouTube video in format 22, we would have to store a mapping to disk (for restarts, etc.). That is, we would need to create an “outname.tmp”, which contains the mapping: https://www.youtube.com/watch?v=regularlink > https://rr2---sn-cu-aigsl.googlevideo.com/videoplayback?temporarymapping

This is quite ugly, of course. Using this, we could - in theory - test if the download will fail or not (even if we were to take 7 or more probes, while keeping the connection alive this would still be much faster than downloading a format twice the size in most cases).

The alternative would be to notice a failure of format 22 and fall-back from there using a different format. Even this would be much faster in most cases. The problem here would be lingering files, when using external downloaders (such as “outname_part.mp4”).

The first solution would be cleaner, but ugly. The second could be messy (at least in some rare cases).

Currently, this seems to only affect format 22 on YouTube. However, it would not surprise me to encounter similar issues with other sites sooner or later.

Did I get that right? What are the chances of taking 7, 11 or however many samples (sprinkled all over the run length of the video) giving a false positive result?

dirkf commented 2 years ago

In the suggested script filenames are quoted to avoid escaping issues. %q isn't a POSIX format. --restrict-filenames could give an extra level of safety.

There's no point remembering a download link. If you really really want format 22, just set off a job that runs yt-dl -f 22 ... until it returns OK, with a suitable delay, maybe start at 100s and double afer each failure.

xxxLCxxx commented 2 years ago

--restrict-filenames is nice indeed. :-) A broken format 22 download will typically leave a file behind.

xxxLCxxx commented 2 years ago

It looks to me like format 22 is being replaced by format 95:

> yt-dlp -F 'https://www.youtube.com/watch?v=2RYlCtK6l6E'
[youtube] 2RYlCtK6l6E: Downloading webpage
[youtube] 2RYlCtK6l6E: Downloading android player API JSON
[youtube] 2RYlCtK6l6E: Downloading m3u8 information
[youtube] 2RYlCtK6l6E: Downloading MPD manifest
[youtube] 2RYlCtK6l6E: Downloading m3u8 information
[youtube] 2RYlCtK6l6E: Downloading MPD manifest
[info] Available formats for 2RYlCtK6l6E:
ID  EXT  RESOLUTION FPS │   FILESIZE   TBR PROTO  │ VCODEC        VBR ACODEC      ABR     ASR MORE INFO
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
139 m4a  audio only     │ ~ 88.43MiB   64k dash   │ audio only        mp4a.40.5   64k 22050Hz DASH audio, m4a_dash
140 m4a  audio only     │ ~198.97MiB  144k dash   │ audio only        mp4a.40.2  144k 44100Hz DASH audio, m4a_dash
160 mp4  256x144     15 │ ~293.57MiB  212k dash   │ avc1.42c00b  212k video only              DASH video, mp4_dash
91  mp4  256x144     15 │ ~401.09MiB  290k m3u8_n │ avc1.42c00b  290k mp4a.40.5    0k
278 webm 256x144     30 │ ~153.37MiB  111k dash   │ vp9          111k video only              DASH video, webm_dash
133 mp4  426x240     30 │ ~630.38MiB  456k dash   │ avc1.4d4015  456k video only              DASH video, mp4_dash
92  mp4  426x240     30 │ ~754.75MiB  546k m3u8_n │ avc1.4d4015  546k mp4a.40.5    0k
242 webm 426x240     30 │ ~229.36MiB  166k dash   │ vp9          166k video only              DASH video, webm_dash
134 mp4  640x360     30 │ ~  1.36GiB 1008k dash   │ avc1.4d401e 1008k video only              DASH video, mp4_dash
93  mp4  640x360     30 │ ~  1.63GiB 1209k m3u8_n │ avc1.4d401e 1209k mp4a.40.2    0k
243 webm 640x360     30 │ ~403.46MiB  292k dash   │ vp9          292k video only              DASH video, webm_dash
135 mp4  854x480     30 │ ~  1.82GiB 1350k dash   │ avc1.4d401f 1350k video only              DASH video, mp4_dash
94  mp4  854x480     30 │ ~  2.12GiB 1568k m3u8_n │ avc1.4d401f 1568k mp4a.40.2    0k
244 webm 854x480     30 │ ~729.54MiB  528k dash   │ vp9          528k video only              DASH video, webm_dash
136 mp4  1280x720    30 │ ~  3.62GiB 2684k dash   │ avc1.4d401f 2684k video only              DASH video, mp4_dash
95  mp4  1280x720    30 │ ~  4.01GiB 2969k m3u8_n │ avc1.4d401f 2969k mp4a.40.2    0k
247 webm 1280x720    30 │ ~  1.40GiB 1040k dash   │ vp9         1040k video only              DASH video, webm_dash

Format 95 appears to be even tighter encoded than 22:

> ll
total 361M
361M Jun 16 11:11 Vagabond_FDA_Approves_COVID-Treating_Alopecia_Drug_w_Heart_Risks.mp4

The size and TBR reporting is completely off the charts. This will scare people away and should rather be blanked than this wrong. The same applies to format 136. The longer the video, the more it is off – by factors.

raszpl commented 2 years ago

It looks to me like format 22 is being replaced by format 95:

its a livestream

xxxLCxxx commented 2 years ago

It was long finished by the time I downloaded it (different time zone here). The encoding seems tight (360 MB for 1:46 in 720p). I had this repeatedly with different videos. Not often, but increasingly more often. Does this mean that they lag even more in pushing these onto 20/136?

There seems to be a system behind the wrong size reporting (logarithmically with size increase). It is so wrong that reporting it doesn't feel right. If your passport said that you were 60", would you pass on that "information"?

xxxLCxxx commented 2 years ago

Now (streamed live 17 hours ago) format 22 is there, 95 is gone and the sizes (136) are corrected. YouTube has truly turned into a shit show. Besides, format 22/136 is 140% in size of the temporary 95. I'm sure that many people would prefer the temp format.

raszpl commented 2 years ago

It always worked this way. Size difference is surprising, their backend tooling must be doing something 'one size fits all' dumb.

xxxLCxxx commented 2 years ago

The information provided on the webm formats can serve as an orientation. I expected something around 1 GB and was pleasantly surprised. However, someone who knows nothing about this nonsensical glitch would be discouraged from downloading or resort to picking one of the "postage stamp formats".

xxxLCxxx commented 2 years ago

I did a bit of poking:

> yt-dlp -v -f 22 'https://www.youtube.com/watch?v=9IJNJ61vtlg'
[debug] Command-line config: ['-v', '-f', '22', 'https://www.youtube.com/watch?v=9IJNJ61vtlg']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.06.22.1 [a86e01e] (zip)
[debug] Python version 3.10.0 (CPython 64bit) - Linux-3.13.0-144-generic-x86_64-with-glibc2.19
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] exe versions: ffmpeg 5.0-static (setts), ffprobe 4.3.1-static, phantomjs 2.1.1
[debug] Optional libraries: sqlite3-2.6.0
[debug] Proxy map: {}
[debug] [youtube] Extracting URL: https://www.youtube.com/watch?v=9IJNJ61vtlg
[youtube] 9IJNJ61vtlg: Downloading webpage
[youtube] 9IJNJ61vtlg: Downloading android player API JSON
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, codec:vp9.2, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), acodec, lang, proto, filesize, fs_approx, tbr, vbr, abr, asr, vext, aext, hasaud, id
[info] 9IJNJ61vtlg: Downloading 1 format(s): 22
[debug] Invoking http downloader on "https://rr6---sn-5fo-c33s.googlevideo.com/videoplayback?expire=1656431460&ei=BM-6YpGeAof97QSx76OACA&ip=184.22.91.135&id=o-AHItVi3UehWznLZgGXcgCujo_-uyQ_kNdWq2gOU2o7Md&itag=22&source=youtube&requiressl=yes&mh=Ij&mm=31%2C26&mn=sn-5fo-c33s%2Csn-npoeened&ms=au%2Conr&mv=m&mvi=6&pl=22&initcwndbps=1947500&vprv=1&mime=video%2Fmp4&cnr=14&ratebypass=yes&dur=1149.294&lmt=1656404293006106&mt=1656407807&fvip=5&fexp=24001373%2C24007246&beids=23886205&c=ANDROID&txp=5432434&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgb3JxBF00iAaGyDmS6hJckY9I_z1kJ-Vcx3A7jroXO98CIQCrYzKfwT3C2o_w-bmby6TvfZeKKOVQ9noEqvoTQybw-w%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRAIgJCIjAWVCDrduT8J403OYDi50KSdz09QcpkMNJBG9RUsCIFtha2O_GfKwbDjTmkw5RvujJ4l0km0vDYNBv4Hc1t-r"
[download] Destination: Ecuador's uprising escalates despite violent gov't repression [9IJNJ61vtlg].mp4
[download]   0.5% of 100.84MiB at  502.43KiB/s ETA 03:24[download] Got server HTTP error: Downloaded 561511 bytes, expected 105735987 bytes. Retrying (attempt 1 of 10) ...

ERROR: Did not get any data blocks
  File "/tmp/.mount_pythonN6DGyc/opt/python3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/tmp/.mount_pythonN6DGyc/opt/python3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/bin/yt-dlp/__main__.py", line 16, in <module>
    yt_dlp.main()
  File "/usr/bin/yt-dlp/yt_dlp/__init__.py", line 919, in main
    _exit(*variadic(_real_main(argv)))
  File "/usr/bin/yt-dlp/yt_dlp/__init__.py", line 911, in _real_main
    return ydl.download(all_urls)
  File "/usr/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 3247, in download
    self.__download_wrapper(self.extract_info)(
  File "/usr/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 3223, in wrapper
    res = func(*args, **kwargs)
  File "/usr/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 1418, in extract_info
    return self.__extract_info(url, self.get_info_extractor(ie_key), download, extra_info, process)
  File "/usr/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 1427, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 1511, in __extract_info
    return self.process_ie_result(ie_result, download, extra_info)
  File "/usr/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 1568, in process_ie_result
    ie_result = self.process_video_result(ie_result, download=download)
  File "/usr/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 2628, in process_video_result
    self.process_info(new_info)
  File "/usr/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 3127, in process_info
    success, real_download = self.dl(temp_filename, info_dict)
  File "/usr/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 2827, in dl
    return fd.download(name, new_info, subtitle)
  File "/usr/bin/yt-dlp/yt_dlp/downloader/common.py", line 444, in download
    ret = self.real_download(filename, info_dict)
  File "/usr/bin/yt-dlp/yt_dlp/downloader/http.py", line 372, in real_download
    return download()
  File "/usr/bin/yt-dlp/yt_dlp/downloader/http.py", line 341, in download
    self.report_error('Did not get any data blocks')
  File "/usr/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 969, in report_error
    self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
  File "/usr/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 901, in trouble
    tb_data = traceback.format_list(traceback.extract_stack())

> curl -v -X GET -H "range: bytes=1-2,500-501,1000-1001" 'https://rr6---sn-5fo-c33s.googlevideo.com/videoplayback?expire=1656431460&ei=BM-6YpGeAof97QSx76OACA&ip=184.22.91.135&id=o-AHItVi3UehWznLZgGXcgCujo_-uyQ_kNdWq2gOU2o7Md&itag=22&source=youtube&requiressl=yes&mh=Ij&mm=31%2C26&mn=sn-5fo-c33s%2Csn-npoeened&ms=au%2Conr&mv=m&mvi=6&pl=22&initcwndbps=1947500&vprv=1&mime=video%2Fmp4&cnr=14&ratebypass=yes&dur=1149.294&lmt=1656404293006106&mt=1656407807&fvip=5&fexp=24001373%2C24007246&beids=23886205&c=ANDROID&txp=5432434&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgb3JxBF00iAaGyDmS6hJckY9I_z1kJ-Vcx3A7jroXO98CIQCrYzKfwT3C2o_w-bmby6TvfZeKKOVQ9noEqvoTQybw-w%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRAIgJCIjAWVCDrduT8J403OYDi50KSdz09QcpkMNJBG9RUsCIFtha2O_GfKwbDjTmkw5RvujJ4l0km0vDYNBv4Hc1t-r'  
* Hostname was NOT found in DNS cache
*   Trying 49.231.60.209...
* Connected to rr6---sn-5fo-c33s.googlevideo.com (49.231.60.209) port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using ECDHE-RSA-AES128-GCM-SHA256
* Server certificate:
*        subject: CN=*.googlevideo.com
*        start date: 2022-06-21 11:05:53 GMT
*        expire date: 2022-08-30 11:05:52 GMT
*        subjectAltName: rr6---sn-5fo-c33s.googlevideo.com matched
*        issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1C3
*        SSL certificate verify ok.
> GET /videoplayback?expire=1656431460&ei=BM-6YpGeAof97QSx76OACA&ip=184.22.91.135&id=o-AHItVi3UehWznLZgGXcgCujo_-uyQ_kNdWq2gOU2o7Md&itag=22&source=youtube&requiressl=yes&mh=Ij&mm=31%2C26&mn=sn-5fo-c33s%2Csn-npoeened&ms=au%2Conr&mv=m&mvi=6&pl=22&initcwndbps=1947500&vprv=1&mime=video%2Fmp4&cnr=14&ratebypass=yes&dur=1149.294&lmt=1656404293006106&mt=1656407807&fvip=5&fexp=24001373%2C24007246&beids=23886205&c=ANDROID&txp=5432434&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgb3JxBF00iAaGyDmS6hJckY9I_z1kJ-Vcx3A7jroXO98CIQCrYzKfwT3C2o_w-bmby6TvfZeKKOVQ9noEqvoTQybw-w%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRAIgJCIjAWVCDrduT8J403OYDi50KSdz09QcpkMNJBG9RUsCIFtha2O_GfKwbDjTmkw5RvujJ4l0km0vDYNBv4Hc1t-r HTTP/1.1
> User-Agent: curl/7.35.0
> Host: rr6---sn-5fo-c33s.googlevideo.com
> Accept: */*
> range: bytes=1-2,500-501,1000-1001
> 
< HTTP/1.1 206 Partial Content
< Last-Modified: Tue, 28 Jun 2022 08:18:13 GMT
< Content-Type: multipart/byteranges; boundary=GoogMRZ6F3DEQ1N4
< Date: Tue, 28 Jun 2022 09:53:15 GMT
< Expires: Tue, 28 Jun 2022 09:53:15 GMT
< Cache-Control: private, max-age=21165
< Accept-Ranges: bytes
< Content-Length: 295
< Connection: close
< Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
< Vary: Origin
< Cross-Origin-Resource-Policy: cross-origin
< X-Content-Type-Options: nosniff
* Server gvs 1.0 is not blacklisted
< Server: gvs 1.0
< 

--GoogMRZ6F3DEQ1N4
Content-Type: video/mp4
Content-Range: bytes 1-2/105735987

--GoogMRZ6F3DEQ1N4
Content-Type: video/mp4
Content-Range: bytes 500-501/105735987

--GoogMRZ6F3DEQ1N4
Content-Type: video/mp4
Content-Range: bytes 1000-1001/105735987

--GoogMRZ6F3DEQ1N4--
* Closing connection 0
* SSLv3, TLS alert, Client hello (1):

When extending the byte ranges to cover more of the whole area, I got errors. The reply on such byte ranges is quite fast. Hence, one could take the size and divide it by some number (17, 33?), effectively taking small samples over the whole file.

dirkf commented 1 year ago

Continued in #31247.

ytdl-org / youtube-dl

Does youtube-dl not have the issue of format 22 being broken sometimes, like yt-dlp has? #30990