yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
76.75k stars 6.03k forks source link

[WDR] extractor parses the same manifest url twice... #6550

Open spookyahell opened 1 year ago

spookyahell commented 1 year ago

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

Checklist

Provide a description that is worded well enough to be understood

This is worrying me...


But let's start from the beginning...

I created a pull request for the WDR plugin thinking I may omit creating an issue about "it", but I realized this is technically a slightly bigger thing*, so I decided now to open this issue after all.

*this could be up for debate depending on the main developers viewpoint of course

yt-dlp parses manifests (i.e. m3u8 "master" urls) from within the plugin, but the issue of WDR has made me aware that we sort-of kinda probably have better safeguards in place that don't cause us to have a situation where the extractor-plugin generates the "media formats" for the same url twice. there could be a generic filter applied to all services before they actually handle the urls.

I should say, I am not too familiar with the way, yt-dlp actually works and handles everything, down to the bone.

But generically speaking, I do believe it should be possible to apply a "duplicate manifest check" for every service in a simple way. - Implementing this may require a simple re-write for affected plug-ins (or all plug-ins if you're trying to be thorough, maybe similar to the way I did with WDR plugin, see PR above)

FYI, the verbose log below includes my implemented fix from PR above, so you won't find the mistake mentioned there, sorry for that.

Provide verbose output that clearly demonstrates the problem

Complete Verbose Output

>yt-dlp -vU -F https://kinder.wdr.de/tv/die-sendung-mit-der-maus/av/video-krone-des-hieron-100.html
[debug] Command-line config: ['-vU', '-F', 'https://kinder.wdr.de/tv/die-sendung-mit-der-maus/av/video-krone-des-hieron-100.html']
[debug] Home config "yt-dlp.conf": ['--format', 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best', '--hls-use-mpegts', '--write-sub', '--all-subs', '--sub-format', 'srt/ssa/ass/vtt/dfxp/ttml/best', '--convert-subs', 'srt', '--ap-mso', 'Verizon', '--ap-username', 'PRIVATE', '--ap-password', 'PRIVATE', '--output', '%(title)s.%(format_id)s.%(ext)s']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.01.06 [6becd25] (pip)
[debug] Python 3.10.2 (CPython AMD64 64bit) - Windows-10-10.0.19045-SP0 (OpenSSL 1.1.1m  14 Dec 2021)
[debug] exe versions: ffmpeg 2022-02-17-git-2812508086-essentials_build-www.gyan.dev (setts), ffprobe 2022-02-17-git-2812508086-essentials_build-www.gyan.dev
[debug] Optional libraries: Cryptodome-3.14.1, brotli-None, certifi-2021.10.08, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.1
[debug] Proxy map: {}
[debug] Loaded 1760 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2023.03.04, Current version: 2023.01.06
[debug] Downloading _update_spec from https://github.com/yt-dlp/yt-dlp/releases/download/2023.03.04/_update_spec
ERROR: You installed yt-dlp with pip or using the wheel from PyPi; Use that to update
[debug] Using fake IP 53.200.88.136 (DE) as X-Forwarded-For
[WDRPage] Extracting URL: https://kinder.wdr.de/tv/die-sendung-mit-der-maus/av/video-krone-des-hieron-100.html
[WDRPage] video-krone-des-hieron-100: Downloading webpage
[download] Downloading playlist: video-krone-des-hieron-100
[WDRPage] Playlist video-krone-des-hieron-100: Downloading 1 items of 1
[download] Downloading item 1 of 1
[debug] Using fake IP 53.75.42.185 (DE) as X-Forwarded-For
[WDR] Extracting URL: http://deviceids-medp.wdr.de/ondemand/277/2773214.js
[WDR] 2773214: Downloading JSON metadata
[WDR] 2773214: Downloading m3u8 information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for mdb-2773214:
ID       EXT RESOLUTION FPS │   TBR PROTO │ VCODEC        VBR ACODEC    ABR
───────────────────────────────────────────────────────────────────────────
hls-615  mp4 480x270     50 │  616k m3u8  │ avc1.4d401e  616k mp4a.40.2  0k
hls-1021 mp4 640x360     50 │ 1022k m3u8  │ avc1.4d401f 1022k mp4a.40.2  0k
hls-1273 mp4 960x540     50 │ 1273k m3u8  │ avc1.4d401f 1273k mp4a.40.2  0k
hls-2682 mp4 1280x720    50 │ 2682k m3u8  │ avc1.640020 2682k mp4a.40.2  0k
hls-4173 mp4 1920x1080   50 │ 4174k m3u8  │ avc1.64002a 4174k mp4a.40.2  0k
[download] Finished downloading playlist: video-krone-des-hieron-100
spookyahell commented 1 year ago

Sorry, about that. It's not a question, it's the bug title.

TLDR; Duplicated manifest urls are being parsed. And WDR may not be the only culprit. (I have a suspicion about ARD) I believe it needs investigation and at best mitigation. (Somebody could probably do a deep-dive but it probably won't be me)

Edit: F-- I can't decide... I have no evidence for further extractors being affected... (That's at best a vague suspicion.) Edit2: I decided now. In favor of #NoClickBait. (=no issue titles that are confusing/misleading/wrong) | PS: I will be heading to bed now (lol) 'Twas a long day.

spookyahell commented 1 year ago

Hm, 5 issue renames within the first 20 minutes, can I get a GWR for "ADD related renaming results on Github" for that? (I do in fact really suffer from ADHD myself, the key is being aware of it and at times making others aware of that too, so they can better understand the behavior.)

spookyahell commented 1 year ago

(A more established yt-dlp contributer is welcome to rename the issue to anything else.)

dirkf commented 1 month ago

Please highlight the duplicate manifests in the posted log. Or is this issue a red herring?