ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.4k stars 9.96k forks source link

[Telegraaf] Unable to download JSON metadata: HTTP Error 403 #31710

Open jgrosmann opened 1 year ago

jgrosmann commented 1 year ago

youtube-dl https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door [Telegraaf] 644858720: Downloading JSON metadata ERROR: Unable to download JSON metadata: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

dirkf commented 1 year ago

Fixed in git master:

commit d35557a75d943865e40410d51bfcc18276e98532
Author: coletdjnz <coletdjnz@protonmail.com>
Date:   Fri Sep 23 12:10:35 2022 +1200

    [Telegraaf] Use mobile GraphQL API endpoint

    Workaround for Cloudflare 403
    Fixes https://github.com/yt-dlp/yt-dlp/issues/5000
    Authored by: coletdjnz

Also: #30839

nicolaasjan commented 1 year ago

Be sure to call youtube-dl with the --verbose flag and include its complete output.

Worked here (although there were a lot of lines like e.g. this:

[mp4 @ 0x55a127d008c0] Invalid DTS: 8305200 PTS: 8301600 in output stream 0:0, replacing by guess

Output: https://pastebin.com/eXQtdMZm

(youtube-dl latest version from source code)

nicolaasjan commented 1 year ago

Video is out of sync. Same with yt-dlp:

yt-dlp -v --ignore-config https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door
[debug] Command-line config: ['-v', '--ignore-config', 'https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.02.26 [8e9fe43cd] (zip)
[debug] Python 3.8.10 (CPython x86_64 64bit) - Linux-5.4.0-139-generic-x86_64-with-glibc2.29 (OpenSSL 1.1.1f  31 Mar 2020, glibc 2.31)
[debug] exe versions: ffmpeg N-109874-gaeceefa622-Nico-20230218 (fdk,setts), ffprobe N-109874-gaeceefa622-Nico-20230218, phantomjs 2.1.1, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.17, brotli-1.0.9, certifi-2022.12.07, mutagen-1.46.0, secretstorage-3.3.3, sqlite3-2.6.0, websockets-10.4, xattr-0.9.6
[debug] Proxy map: {}
[debug] Loaded 1782 extractors
[Telegraaf] Extracting URL: https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door
[Telegraaf] 644858720: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading MPD manifest
[Telegraaf] RylE3djQ5q02: Downloading m3u8 information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] RylE3djQ5q02: Downloading 1 format(s): hls-3943+dash-audio=127999
[debug] Invoking hlsnative downloader on "https://media.tmgvideo.nl/hls/account=Kx1PKc/item=RylE3djQ5q02/version=202302261308_5/v2.0-RylE3djQ5q02-hls-202302261308_5-video=3597000.m3u8?v=20230226130802_4"
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 19
[download] Destination: Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].fhls-3943.mp4
[download]   7.3% of ~  71.84MiB at    4.36MiB/s ETA 00:08 (frag 1/19)[download] Got error: Downloaded 1048576 bytes, expected 3504696 bytes. Retrying (1/10)...
[download]   8.6% of ~  71.84MiB at    4.61MiB/s ETA 00:09 (frag 1/19)[download] Got error: Downloaded 2064384 bytes, expected 3504696 bytes. Retrying (2/10)...
[download]  38.8% of ~  68.39MiB at    4.86MiB/s ETA 00:04 (frag 7/19)[download] Got error: Downloaded 1146880 bytes, expected 3507704 bytes. Retrying (1/10)...
[download]  57.2% of ~  67.52MiB at    6.42MiB/s ETA 00:03 (frag 10/19)[download] Got error: Downloaded 3145728 bytes, expected 3652088 bytes. Retrying (1/10)...
[download]  63.5% of ~  68.26MiB at    5.63MiB/s ETA 00:03 (frag 12/19)[download] Got error: Downloaded 983040 bytes, expected 4470264 bytes. Retrying (1/10)...
[download]  65.1% of ~  68.26MiB at    5.65MiB/s ETA 00:02 (frag 12/19)[download] Got error: Downloaded 2064384 bytes, expected 4470264 bytes. Retrying (2/10)...
[download]  75.4% of ~  67.73MiB at    5.09MiB/s ETA 00:02 (frag 14/19)[download] Got error: Downloaded 1015808 bytes, expected 3531768 bytes. Retrying (1/10)...
[download]  76.9% of ~  67.73MiB at    3.18MiB/s ETA 00:02 (frag 14/19)[download] Got error: Downloaded 2064384 bytes, expected 3531768 bytes. Retrying (2/10)...
[download]  91.0% of ~  67.06MiB at    5.49MiB/s ETA 00:00 (frag 17/19)[download] Got error: Downloaded 1081344 bytes, expected 3718264 bytes. Retrying (1/10)...
[download] 100% of   63.85MiB in 00:00:09 at 6.44MiB/s
[debug] Invoking dashsegments downloader on "https://media.tmgvideo.nl/dash/account=Kx1PKc/item=RylE3djQ5q02/version=202302261308_5/RylE3djQ5q02.mpd?v=20230226130802_4"
[dashsegments] Total fragments: 74
[download] Destination: Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].fdash-audio=127999.m4a
[download] 100% of    2.25MiB in 00:00:04 at 490.34KiB/s
[Merger] Merging formats into "Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].mp4"
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].fhls-3943.mp4' -i 'file:Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].fdash-audio=127999.m4a' -c copy -map 0:v:0 -map 1:a:0 -movflags +faststart 'file:Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].temp.mp4'
Deleting original file Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].fhls-3943.mp4 (pass -k to keep)
Deleting original file Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].fdash-audio=127999.m4a (pass -k to keep)

@dirkf, is there a remedy for such an issue?

Vangelis66 commented 1 year ago

When I -F the link in OP, I get myself:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--ffmpeg-location', '.\\FFmpeg', '--external-downloader-args', '-v 8 -stats', '-vF', 'https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door']
[debug] Encodings: locale cp1253, fs mbcs, out cp737, pref cp1253
[debug] youtube-dl version 2023.02.27.114514
[debug] Python version 3.4.4 (CPython) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg n5.2-dev-2245-N-109649-gab8cde6, ffprobe n5.2-dev-2245-N-109649-gab8cde6, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[Telegraaf] 644858720: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading MPD manifest
[Telegraaf] RylE3djQ5q02: Downloading m3u8 information
[info] Available formats for RylE3djQ5q02:
format code               extension  resolution note
hls-audio-aacl-127-audio  mp4        audio only
hls-audio-aacl-64-audio   mp4        audio only
dash-audio=64045          m4a        audio only DASH audio   64k , m4a_dash container, mp4a.40.2 (48000Hz)
dash-audio=127999         m4a        audio only DASH audio  127k , m4a_dash container, mp4a.40.2 (48000Hz)
dash-video=318000         mp4        480x270    DASH video  318k , mp4_dash container, avc1.640015, video only
hls-402                   mp4        480x270     402k , avc1.640015, video only
dash-video=1182000        mp4        854x480    DASH video 1182k , mp4_dash container, avc1.64001E, video only
hls-1383                  mp4        854x480    1383k , avc1.64001E, video only
dash-video=2181000        mp4        1280x720   DASH video 2181k , mp4_dash container, avc1.64001F, video only
hls-2442                  mp4        1280x720   2442k , avc1.64001F, video only
dash-video=3597000        mp4        1920x1080  DASH video 3597k , mp4_dash container, avc1.640028, video only
hls-3943                  mp4        1920x1080  3943k , avc1.640028, video only
http-270p                 mp4        480x270
http-480p                 mp4        854x480
http-720p                 mp4        1280x720
http-1080p                mp4        1920x1080  (best)

So, I would've expected that on an actual download attempt, format best = http-1080p would be fetched ... But,

[debug] Default format spec: bestvideo+bestaudio/best

is set as the default... Was it always like that ("fragmented" formats preferred over "standalone container" ones) ?

(although there were a lot of lines like e.g. this:

[mp4 @ 0x55a127d008c0] Invalid DTS: 8305200 PTS: 8301600 in output stream 0:0, replacing by guess

You can suppress such FFmpeg output about inconsistencies between DTS/PTS via --external-downloader-args "-v 8 -stats" or use --hls-prefer-native flag...

FWIW, DTS = decoding timestamp and PTS = presentation timestamp; more info here 😄 ...

dirkf commented 1 year ago

Try -f '[format_id^=http]' ?

dirkf commented 1 year ago

... Was it always like that ...

FM says:

Since the end of April 2015 and version 2015.04.26, youtube-dl uses -f bestvideo+bestaudio/best as the default format selection (see #5447 (https://github.com/ytdl-org/youtube-dl/issues/5447), #5456 (https://github.com/ytdl-org/youtube-dl/issues/5456)).

Vangelis66 commented 1 year ago

@nicolaasjan

[info] RylE3djQ5q02: Downloading 1 format(s): hls-3943+dash-audio=127999

The above is from yt-dlp, but it's fetching and (later merging) video over HLS and audio over DASH...

In youtube-dl, my command below:

yt-dl --console-title --hls-prefer-native --hls-use-mpegts -c --no-part -f hls-2442+hls-audio-aacl-127-audio "https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door" -o test.mp4 => 

[Telegraaf] 644858720: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading MPD manifest
[Telegraaf] RylE3djQ5q02: Downloading m3u8 information
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 19
[download] Destination: test.fhls-2442.mp4
[download] 100% of 38.89MiB in 00:59
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 19
[download] Destination: test.fhls-audio-aacl-127-audio.mp4
[download] 100% of 2.45MiB in 00:13
[ffmpeg] Merging formats into "test.mp4"
Deleting original file test.fhls-2442.mp4 (pass -k to keep)
Deleting original file test.fhls-audio-aacl-127-audio.mp4 (pass -k to keep)

produced a media file with perfect A/V sync... In my own fetches, I never mix different transfer protocols for raw video+raw audio (i.e I specifically request hls-V+hls-A or dash-V+dash-A); as advised, the http(s) formats, whenever available, are a good (and speedier) solution to circumvent consecutive raw stream download and merge...

@dirkf : Thanks for bringing me "up-to-date" 😉 ; memory lapse/brain fog on my part? My ancient ancestors described it better:

ού γάρ έρχεται μόνον...

nicolaasjan commented 1 year ago

-f dash-video=3597000+dash-audio=127999 also gives a good result.

If this is an issue with all videos from this site, should the extractor for "De Telegraaf" be rewritten?

dirkf commented 1 year ago

So, for Telegraaf, -f '(bestvideo+bestaudio)[format_id^=hls]/(bestvideo+bestaudio)[format_id^=dash]/best' ?

Or -f best ?

Or the format selection should automatically attempt to find a matching audio format?

dirkf commented 1 year ago

ού γάρ έρχεται μόνον...

Words that I recognised individually, but couldn't make into a sentence. Then I realised I had the same problem ...

Vangelis66 commented 1 year ago

Words that I recognised individually, but couldn't make into a sentence.

... The whole phrase is said to have been:

δεινόν τό γῆρας, οὐ γάρ ἔρχεται μόνον

loosely translated into English:

Old age/(the process of) aging is dire, for it doesn't come alone/on its own...

... meaning it's not just your age (in years) that grows bigger with time, growing older entails all sorts of (mostly unwanted/debilitating) side-effects...

Over the millennia, only the second part of the phrase has survived till today, while the first is simply inferred...

dirkf commented 1 year ago

When I asked the Web, it had Χαλεπόν instead of δεινόν, a less familiar word if only because we don't have "chaleposaurs".

Vangelis66 commented 1 year ago

... As I wrote above, only the second part of the phrase has survived unaltered; several variants of the first part are documented, including

xαλεπόν τὸ γῆρας (χαλεπός)

or

φοβοῦ τὸ γῆρας (beware of old age)

Interestingly enough, my own search for a clear English translation of the phrase has revealed this UK forum thread 😉 , so I'm now confident the phrase is known (translated, of course) to the English speaking world, too... 😄

(I promise no more OT shall be posted here, apologies for bringing it up in the first place 😜 ...)

dirkf commented 1 year ago

On 28/02/2023 14:48, Vangelis66 wrote:

... As I wrote above, only the second part of the /phrase/ has survived unaltered; several variants of the first part are documented, including

xαλεπόν τό γῆρας (χαλεπός https://lsj.gr/wiki/%CF%87%CE%B1%CE%BB%CE%B5%CF%80%CF%8C%CF%82)

or

φοβού τό γῆρας (beware of old age)

LSJ even fingers (Il.8.103) the Homeric description of γῆρας: χαλεπὸν δέ σε γῆρας ὀπάζει ("old age weighs harshly on you" maybe?).

Interestingly enough, my own search for a clear English translation of the /phrase/ has revealed this UK forum thread https://bushcraftuk.com/community/threads/old-age-does-not-come-alone.133947/page-2 😉 , so I'm now confident the /phrase/ is known (translated, of course) to the English speaking world, too... 😄 ...

Not familiar to me though, but apparently well known, though the origin https://www.translatum.gr/forum/index.php?topic=33861.0 doesn't seem to be mentioned by anyone who quotes it, even as a "Greek proverb", and even in Welsh "henaint ni ddaw [wrth] ei hunan" or Finnish "vanhuus ei tule yksin".

A similar idea in Hamlet (Shakespeare wasn't unfamiliar with the classics, probably even Menander):

When sorrows come, they come not single spies, But in battalions.

regards

dirkf commented 1 year ago

@pukkandan, is this a known problem?

yt-dl currently needs manual intervention to select the same transport when bestvideo+bestaudio is picked. In this case it would seem better to select the combined format that appears to have the same resolution as the separate formats.

With yt-dlp's bv*+ba, does any audio in the bv* stream get selected ahead of a separate ba, or if it's not "worse"? Apparently not in the logged case.

pukkandan commented 1 year ago

With yt-dlp's bv+ba, does any audio in the bv stream get selected ahead of a separate ba, or if it's not "worse"?

Yes. However,

So, I would've expected that on an actual download attempt, format best = http-1080p would be fetched ... But,

Since the hls/dash formats have more metadata available, yt-dlp is treating them as being better.

❯ yt-dlp -F https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door
[Telegraaf] Extracting URL: https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door
[Telegraaf] 644858720: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading MPD manifest
[Telegraaf] RylE3djQ5q02: Downloading m3u8 information
[info] Available formats for RylE3djQ5q02:
ID                       EXT RESOLUTION │  FILESIZE   TBR PROTO │ VCODEC        VBR ACODEC      ABR ASR MORE INFO
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
hls-audio-aacl-127-audio mp4 audio only │                 m3u8  │ audio only        unknown             audio
hls-audio-aacl-64-audio  mp4 audio only │                 m3u8  │ audio only        unknown             audio
dash-audio=64045         m4a audio only │ ~ 1.13MiB   64k dash  │ audio only        mp4a.40.2   64k 48k DASH audio, m4a_dash
dash-audio=127999        m4a audio only │ ~ 2.27MiB  128k dash  │ audio only        mp4a.40.2  128k 48k DASH audio, m4a_dash
http-270p                mp4 480x270    │                 https │ unknown           unknown
dash-video=318000        mp4 480x270    │ ~ 5.63MiB  318k dash  │ avc1.640015  318k video only          DASH video, mp4_dash
hls-402                  mp4 480x270    │ ~ 7.12MiB  402k m3u8  │ avc1.640015  402k video only
http-480p                mp4 854x480    │                 https │ unknown           unknown
dash-video=1182000       mp4 854x480    │ ~20.92MiB 1182k dash  │ avc1.64001E 1182k video only          DASH video, mp4_dash
hls-1383                 mp4 854x480    │ ~24.48MiB 1383k m3u8  │ avc1.64001E 1383k video only
http-720p                mp4 1280x720   │                 https │ unknown           unknown
dash-video=2181000       mp4 1280x720   │ ~38.60MiB 2181k dash  │ avc1.64001F 2181k video only          DASH video, mp4_dash
hls-2442                 mp4 1280x720   │ ~43.22MiB 2442k m3u8  │ avc1.64001F 2442k video only
http-1080p               mp4 1920x1080  │                 https │ unknown           unknown
dash-video=3597000       mp4 1920x1080  │ ~63.67MiB 3597k dash  │ avc1.640028 3597k video only          DASH video, mp4_dash
hls-3943                 mp4 1920x1080  │ ~69.79MiB 3943k m3u8  │ avc1.640028 3943k video only

This is a trivial fix in the extractor, but before I commit anything, I need confirmation that https formats are always expected to be of same/better quality that others for same resolution (for this site). We can also make it treat hls > dash for audio which would also prevent desync. Unlike prioritizing https, this is something youtube-dl can easily do as well.

Or the format selection should automatically attempt to find a matching audio format?

This would be quite difficult. I can't even think of how we could approach an implementation within the current format selection framework.