ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.3k stars 10.03k forks source link

Svtplay directs to audio-described format by default #31441

Open ijbh opened 1 year ago

ijbh commented 1 year ago

Checklist

Verbose log

youtube-dl https://www.svtplay.se/video/jMdpzdL/pa-sparet/fre-18-nov-20-00?info=visa
[SVTPlay] jMdpzdL: Downloading webpage
[SVTPlay] KQz6x1A: Downloading JSON metadata
[SVTPlay] KQz6x1A: Downloading m3u8 information
[SVTPlay] KQz6x1A: Downloading m3u8 information
[SVTPlay] KQz6x1A: Downloading m3u8 information
[SVTPlay] KQz6x1A: Downloading m3u8 information
[SVTPlay] KQz6x1A: Downloading m3u8 information
[SVTPlay] KQz6x1A: Downloading m3u8 information
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 922
[download] Destination: 18 nov 20_00-KQz6x1A.fhls-ts-full-3375.mp4
[download]   9.3% of ~1.32GiB at 19.19MiB/s ETA 03:06

Description

Since a few weeks, downloading the normal video link will automatically redirect and download the alternative video for sight impaired which is similar but contains an irritating commentator voice laid over. The real video contains the text "jMdpzdL" but the downloaded file says "KQz6x1A". Apologies if this is incorrectly described, I don't know how else to say it.

dirkf commented 1 year ago

This is the same problem reported at https://github.com/yt-dlp/yt-dlp/issues/5164, which may be useful to read.

As a work-around, try -f 'bestvideo+bestaudio[format_id!*=Uppl]/best'. You can't say Uppläst because the format selection parser doesn't (yet) recognise non-ASCII alphabetic characters.

To avoid the low bitrate audio too, as in #30480 (see that issue for more tips), use both format selectors: -f 'bestvideo+bestaudio[format_id!*=Uppl][format_id!*=lb]/best'.

Obviously some improvements should be made to the extractor so that the formats most likely to be wanted have priority. The yt-dlp issue suggests non-geo-restricted videos from the page https://www.svtplay.se/15-minuter-fran-sapmi for developers who don't have a Swedish presence.

dirkf commented 1 year ago

This patch de-prioritises lb formats and de-prioritises audio-described formats more strongly. It also gets all the available formats and removes duplicates.

--- old/youtube_dl/extractor/svt.py
+++ new/youtube_dl/extractor/svt.py
@@ -18,6 +18,7 @@

 class SVTBaseIE(InfoExtractor):
     _GEO_COUNTRIES = ['SE']
+    _GEO_BYPASS = False

     def _extract_video(self, video_info, video_id):
         is_live = dict_get(video_info, ('live', 'simulcast'), default=False)
@@ -37,9 +38,8 @@
                     vurl + '?hdcore=3.3.0', video_id,
                     f4m_id=player_type, fatal=False))
             elif ext == 'mpd':
-                if player_type == 'dashhbbtv':
-                    formats.extend(self._extract_mpd_formats(
-                        vurl, video_id, mpd_id=player_type, fatal=False))
+                formats.extend(self._extract_mpd_formats(
+                    vurl, video_id, mpd_id=player_type, fatal=False))
             else:
                 formats.append({
                     'format_id': player_type,
@@ -50,6 +50,14 @@
             self.raise_geo_restricted(
                 'This video is only available in Sweden',
                 countries=self._GEO_COUNTRIES)
+        for f in formats:
+            if f.get('vcodec') == 'none' and '-lb-' in f['format_id']:
+                f['preference'] = int_or_none(f.get('preference'), default=-1) - 1
+            # the separator will become `_` in core processing
+            if '-Uppläst undertext' in f['format_id']:
+                f['preference'] = int_or_none(f.get('preference'), default=-1) - 10
+
+        self._remove_duplicate_formats(formats)
         self._sort_formats(formats)

         subtitles = {}
@@ -59,11 +67,13 @@
                 subtitle_url = sr.get('url')
                 subtitle_lang = sr.get('language', 'sv')
                 if subtitle_url:
+                    sub = {
+                        'url': subtitle_url,
+                    }
                     if determine_ext(subtitle_url) == 'm3u8':
-                        # TODO(yan12125): handle WebVTT in m3u8 manifests
-                        continue
-
-                    subtitles.setdefault(subtitle_lang, []).append({'url': subtitle_url})
+                        # XXX: no way of testing, is it ever hit?
+                        sub['ext'] = 'vtt'
+                    subtitles.setdefault(subtitle_lang, []).append(sub)

         title = video_info.get('title')

Then:

$ python -m youtube_dl -v  'https://www.svtplay.se/video/jm3EYvp/15-minuter-fran-sapmi/duodji-mastarbrev-och-framtidstankar?info=visa' -F
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://www.svtplay.se/video/jm3EYvp/15-minuter-fran-sapmi/duodji-mastarbrev-och-framtidstankar?info=visa', u'-F']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 774b346f9
[debug] Python version 2.7.18 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[SVTPlay] jm3EYvp: Downloading webpage
[SVTPlay] ePwpAML: Downloading JSON metadata
[SVTPlay] ePwpAML: Downloading MPD manifest
[SVTPlay] ePwpAML: Downloading m3u8 information
[SVTPlay] ePwpAML: Downloading m3u8 information
[SVTPlay] ePwpAML: Downloading m3u8 information
[SVTPlay] ePwpAML: Downloading MPD manifest
[SVTPlay] ePwpAML: Downloading m3u8 information
[SVTPlay] ePwpAML: Downloading m3u8 information
[SVTPlay] ePwpAML: Downloading MPD manifest
[SVTPlay] ePwpAML: Downloading MPD manifest
[SVTPlay] ePwpAML: Downloading MPD manifest
[SVTPlay] ePwpAML: Downloading m3u8 information
[SVTPlay] ePwpAML: Downloading MPD manifest
[info] Available formats for ePwpAML:
format code                                    extension  resolution note
hls-cmaf-lb-full-AAC-60-2ch-Uppläst_undertext  mp4        audio only [sv-x-tal] 
hls-ts-lb-full-AAC-60-2ch-Uppläst_undertext    mp4        audio only [sv-x-tal] 
hls-cmaf-full-AAC-190-2ch-Uppläst_undertext    mp4        audio only [sv-x-tal] 
hls-ts-full-AAC-190-2ch-Uppläst_undertext      mp4        audio only [sv-x-tal] 
hls-cmaf-lb-full-AAC-60-2ch-Svenska            mp4        audio only [sv] 
hls-ts-lb-full-AAC-60-2ch-Svenska              mp4        audio only [sv] 
hls-cmaf-full-AAC-190-2ch-Svenska              mp4        audio only [sv] 
hls-ts-avc-AAC-190-2ch-Svenska                 mp4        audio only [sv] 
hls-cmaf-lb-full-310                           mp4        416x234     310k , hvc1.2.4.L123.90, 25.0fps, video only
hls-cmaf-lb-full-391                           mp4        416x234     391k , avc1.42c01f, 25.0fps, video only
hls-cmaf-lb-full-529                           mp4        640x360     529k , hvc1.2.4.L123.90, 25.0fps, video only
hls-ts-avc-551                                 mp4        416x234     551k , avc1.42c01f, 25.0fps, video only
hls-cmaf-lb-full-873                           mp4        640x360     873k , avc1.4d401f, 25.0fps, video only
hls-cmaf-lb-full-916                           mp4        960x540     916k , hvc1.2.4.L123.90, 25.0fps, video only
hls-ts-avc-1043                                mp4        640x360    1043k , avc1.4d401f, 25.0fps, video only
dash-lb-full-0                                 mp4        960x540    DASH video 1476k , mp4_dash container, hvc1.2.4.L123.90, 25fps, video only
hls-cmaf-full-1496                             mp4        1280x720   1496k , hvc1.2.4.L123.90, 25.0fps, video only
hls-cmaf-full-1502                             mp4        960x540    1502k , avc1.4d401f, 25.0fps, video only
hls-ts-avc-1555                                mp4        960x540    1555k , avc1.4d401f, 25.0fps, video only
hls-cmaf-full-2258                             mp4        1280x720   2258k , avc1.4d401f, 25.0fps, video only
hls-ts-avc-2327                                mp4        1280x720   2327k , avc1.4d401f, 25.0fps, video only
hls-cmaf-full-2592                             mp4        1920x1080  2592k , hvc1.2.4.L123.90, 25.0fps, video only
hls-cmaf-full-3289                             mp4        1920x1080  3289k , avc1.640029, 25.0fps, video only
hls-ts-avc-3381                                mp4        1920x1080  3381k , avc1.640029, 25.0fps, video only
dash-full-0                                    mp4        1920x1080  DASH video 4335k , mp4_dash container, hvc1.2.4.L123.90, 25fps, video only
dash-hbbtv-hevc-0                              mp4        1920x1080  DASH video 4335k , mp4_dash container, hvc1.2.4.L123.90, 25fps, video only
dash-avc-0                                     mp4        1920x1080  DASH video 5096k , mp4_dash container, avc1.640029, 25fps, video only
dash-hbbtv-avc-0                               mp4        1920x1080  DASH video 5096k , mp4_dash container, avc1.640029, 25fps, video only (best)
$

Actually the problem URL is also unrestricted ("Kan ses i hela världen") and any such videos can be downloaded with --no-geo-bypass. A current geo-restricted video would be good for testing.

ijbh commented 1 year ago

Hi, thank you so much.

I'm afraid I don't understand all of this, but will re-read it more times. Is it enough that I download the patch and then it should work as normal?

By the way, here is an example link which is Geo-restricted and gives the audio described format: https://www.svtplay.se/video/jqWAyAX/la-boheme-med-freni-och-pavarotti

It will only work for a few days more though.

dirkf commented 1 year ago

Thanks.

This test video shows that the geo-bypass mechanism used by yt-dl is not effective for SVT and should be disabled: eg, from UK with the original patch (the X-Forwarded-For line is the attempted bypass):

$ python3.9 -m youtube_dl -v  'https://www.svtplay.se/video/jqWAyAX/la-boheme-med-freni-och-pavarotti' -F
[debug] System config: ['--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://www.svtplay.se/video/jqWAyAX/la-boheme-med-freni-och-pavarotti', '-F']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 774b346f9
[debug] Python version 3.9.15 (CPython) - Linux-4.4.0-210-generic-i686-with-glibc2.23
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[debug] Using fake IP 78.78.194.226 (SE) as X-Forwarded-For.
[SVTPlay] jqWAyAX: Downloading webpage
[SVTPlay] jNwaGAN: Downloading JSON metadata
[SVTPlay] jNwaGAN: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[SVTPlay] jNwaGAN: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[SVTPlay] jNwaGAN: Downloading MPD manifest
WARNING: Failed to download MPD manifest: HTTP Error 403: Forbidden
[SVTPlay] jNwaGAN: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[SVTPlay] jNwaGAN: Downloading MPD manifest
WARNING: Failed to download MPD manifest: HTTP Error 403: Forbidden
[SVTPlay] jNwaGAN: Downloading MPD manifest
WARNING: Failed to download MPD manifest: HTTP Error 403: Forbidden
[SVTPlay] jNwaGAN: Downloading MPD manifest
WARNING: Failed to download MPD manifest: HTTP Error 403: Forbidden
[SVTPlay] jNwaGAN: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[SVTPlay] jNwaGAN: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[SVTPlay] jNwaGAN: Downloading MPD manifest
WARNING: Failed to download MPD manifest: HTTP Error 403: Forbidden
[SVTPlay] jNwaGAN: Downloading MPD manifest
WARNING: Failed to download MPD manifest: HTTP Error 403: Forbidden
[SVTPlay] jNwaGAN: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
ERROR: This video is only available in Sweden
This video is available in Sweden.
You might want to use a VPN or a proxy server (with --proxy) to workaround.
...
$

I've updated the patch above. But it's probably easiest just to use the work-around format selection option (read more about format selection in the Manual).

trudK45 commented 1 year ago

The work around does not seem to work:

C:\Users\hasse\Documents\Jaksta\svtplay-dl>youtube-dl.exe -f 'bestvideo+bestaudio[format_id!*=Uppl]/best' https://www.svtplay.se/video/K16qBoP/ett-fall-for-vera/3-broken-promise?info=visa
[SVTPlay] K16qBoP: Downloading webpage
[SVTPlay] K169ypA: Downloading JSON metadata
[SVTPlay] K169ypA: Downloading m3u8 information
[SVTPlay] K169ypA: Downloading m3u8 information
[SVTPlay] K169ypA: Downloading m3u8 information
[SVTPlay] K169ypA: Downloading m3u8 information
[SVTPlay] K169ypA: Downloading m3u8 information
[SVTPlay] K169ypA: Downloading m3u8 information
ERROR: requested format not available

C:\Users\hasse\Documents\Jaksta\svtplay-dl>youtube-dl.exe --version
2021.12.17
dirkf commented 1 year ago

For Windows cmd, use "double quotes" instead (also enclosing the "URL" is good practice in case it contains a special command interpreter character like &):

$ wine cmd
Wine CMD Version 5.1.2600 (1.6.2)

>youtube-dl -f "bestvideo+bestaudio[format_id!*=Uppl]/best" --test "https://www.svtplay.se/video/jm3EYvp/15-minuter-fran-sapmi/duodji-mastarbrev-och-framtidstankar?info=visa"
[SVTPlay] jm3EYvp: Downloading webpage
[SVTPlay] ePwpAML: Downloading JSON metadata
[SVTPlay] ePwpAML: Downloading m3u8 information
[SVTPlay] ePwpAML: Downloading m3u8 information
[SVTPlay] ePwpAML: Downloading m3u8 information
[SVTPlay] ePwpAML: Downloading m3u8 information
[SVTPlay] ePwpAML: Downloading m3u8 information
[SVTPlay] ePwpAML: Downloading m3u8 information
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 205
[download] Destination: Duodji - mästarbrev och framtidstankar-ePwpAML.fhls-ts-full-3381.mp4
[download] 100% of 2.69MiB in 00:00
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 205
[download] Destination: Duodji - mästarbrev och framtidstankar-ePwpAML.fhls-ts-lb-full-AAC-60-2ch-Svenska.mp4
[download] 100% of 10.00KiB in 00:00
[ffmpeg] Merging formats into "Duodji - mästarbrev och framtidstankar-ePwpAML.mp4"
Deleting original file Duodji - mästarbrev och framtidstankar-ePwpAML.fhls-ts-full-3381.mp4 (pass -k to keep)
Deleting original file Duodji - mästarbrev och framtidstankar-ePwpAML.fhls-ts-lb-full-AAC-60-2ch-Svenska.mp4 (pass -k to keep)

>

Here I used an unrestricted show that I could download in the UK.