ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.34k stars 9.95k forks source link

Inconsistent handling of non-alphanumeric characters in format selection values #29572

Open dirkf opened 3 years ago

dirkf commented 3 years ago

Checklist

Verbose log

Show the available formats (snipped):

$ youtube-dl -F -v --ignore-config 'https://www.bbc.co.uk/iplayer/episode/m000j4wd/darkest-hour'[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-F', '-v', '--ignore-config', 'https://www.bbc.co.uk/iplayer/episode/m000j4wd/darkest-hour']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Python version 3.5.2 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[bbc.co.uk] m000j4wd: Downloading video page
[bbc.co.uk] m000j4wd: Downloading playlist JSON
[bbc.co.uk] m000j4wb: Downloading media selection JSON
[bbc.co.uk] m000j4wb: Downloading MPD manifest
[bbc.co.uk] m000j4wb: Downloading MPD manifest
[bbc.co.uk] m000j4wb: Downloading MPD manifest
[bbc.co.uk] m000j4wb: Downloading MPD manifest
[bbc.co.uk] m000j4wb: Downloading MPD manifest
[bbc.co.uk] m000j4wb: Downloading MPD manifest
[bbc.co.uk] m000j4wb: Downloading m3u8 information
[bbc.co.uk] m000j4wb: Downloading m3u8 information
[bbc.co.uk] m000j4wb: Downloading m3u8 information
[bbc.co.uk] m000j4wb: Downloading m3u8 information
[bbc.co.uk] m000j4wb: Downloading m3u8 information
[bbc.co.uk] m000j4wb: Downloading m3u8 information
[bbc.co.uk] m000j4wb: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[bbc.co.uk] m000j4wb: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[info] Available formats for m000j4wb:
format code                         extension  resolution note
mf_akamai-audio_eng_1=128000-0      m4a        audio only [en] DASH audio  128k , m4a_dash container, mp4a.40.2 (48000Hz)
mf_akamai-audio_eng_1=128000-1      m4a        audio only [en] DASH audio  128k , m4a_dash container, mp4a.40.2 (48000Hz)
mf_bidi-audio_eng_1=128000-0        m4a        audio only [en] DASH audio  128k , m4a_dash container, mp4a.40.2 (48000Hz)
mf_bidi-audio_eng_1=128000-1        m4a        audio only [en] DASH audio  128k , m4a_dash container, mp4a.40.2 (48000Hz)
mf_cloudfront-audio_eng_1=128000-0  m4a        audio only [en] DASH audio  128k , m4a_dash container, mp4a.40.2 (48000Hz)
mf_cloudfront-audio_eng_1=128000-1  m4a        audio only [en] DASH audio  128k , m4a_dash container, mp4a.40.2 (48000Hz)
mf_akamai-video=827000-0            mp4        704x396    DASH video  827k , mp4_dash container, avc3.4D401F, 25fps, video only
mf_akamai-video=827000-1            mp4        704x396    DASH video  827k , mp4_dash container, avc3.4D401F, 25fps, video only
...
mf_cloudfront-1800-1                mp4        704x396    1800k , avc1.64001F@1570k, 50.0fps, mp4a.40.2@128k (best)

Now fail to exclude one:

$ youtube-dl -j -f 'bestvideo[format_id!=mf_akamai-video=827000-0 ]+bestaudio' --ignore-config 'https://www.bbc.co.uk/iplayer/episode/m000j4wd/darkest-hour'
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
Traceback (most recent call last):
  File "/usr/bin/youtube-dl", line 9, in <module>
    load_entry_point('youtube-dl==2021.6.6', 'console_scripts', 'youtube-dl')()
  File "/usr/lib/python3/dist-packages/youtube_dl/__init__.py", line 475, in main
    _real_main(argv)
  File "/usr/lib/python3/dist-packages/youtube_dl/__init__.py", line 465, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 2069, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 808, in extract_info
    return self.__extract_info(url, ie, download, extra_info, process)
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 847, in __extract_info
    return self.process_ie_result(ie_result, download, extra_info)
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 881, in process_ie_result
    return self.process_video_result(ie_result, download=download)
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 1653, in process_video_result
    format_selector = self.build_format_selector(req_format)
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 1431, in build_format_selector
    return _build_selector_function(parsed_selector)
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 1278, in _build_selector_function
    fs = [_build_selector_function(s) for s in selector]
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 1278, in <listcomp>
    fs = [_build_selector_function(s) for s in selector]
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 1388, in _build_selector_function
    video_selector, audio_selector = map(_build_selector_function, selector.selector)
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 1395, in _build_selector_function
    filters = [self._build_format_filter(f) for f in selector.filters]
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 1395, in <listcomp>
    filters = [self._build_format_filter(f) for f in selector.filters]
  File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 1133, in _build_format_filter
    raise ValueError('Invalid filter specification %r' % filter_spec)
ValueError: Invalid filter specification 'format_id!=mf_akamai-video=827000-0'
$

Description

In the format selection code in YoutubeDL.py, the parameter on the right-hand side of a format selection string comparison (eg hls in -f best[format_id!*=hls]) is matched with this regex (?P<value>[a-zA-Z0-9._-]+). However the format_id value is sanitised with this re.sub(r'[\s,/+\[\]()]', '_', format_id).

If the site provides a format ID that contains a character that is not alphanumeric or in [._-] and is not sanitised, a selection expression that specifically excludes the ID ([format_id!=problem_id]) causes an exception ValueError('Invalid filter specification...'). For instance, the character '=' causes this, as shown: see the -F results in this log for another site that produces such format IDs. Among other potentially problematic printable characters are "#$%&'*:;<>?@^{|}~£ as well as ` and non-ASCII alphabetic characters.

Either the string value should be matched with something like (?P<value>[^\s,/+\[\]()]+)or the format_id value should be sanitised with re.sub(r'[^a-zA-Z0-9._-]', '_', format_id). At least, the = character should be added to the string value match.

In principle this problem might affect other string-valued format selection fields (ext, acodec, vcodec, container, protocol, language) but these are unlikely to contain problematic characters.

Still isn't labelled as a bug.

connesc commented 2 years ago

I face the exact same problem: I'm unable to exclude a format that has a = in its ID. My current workaround is to exclude the prefix with !^=, but this is far from robust.

dirkf commented 2 years ago

Feel free to apply the one-line-of-code (2 comment lines) patch from the linked PR and try it out, if your yt-dl installation allows (ie, not a single executable bundle).

serainox420 commented 2 years ago

Thanks for quick response, I'll try out soon <3

On Fri, 24 Dec 2021 at 12:59, dirkf @.***> wrote:

Feel free to apply the one-line-of-code (2 comment lines) patch from the linked PR and try it out, if your yt-dl installation allows (ie, not a single executable bundle).

— Reply to this email directly, view it on GitHub https://github.com/ytdl-org/youtube-dl/issues/29572#issuecomment-1000810140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWVNELOQOCWUYGEUFMCYW5TUSROD7ANCNFSM5ARDOVFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

dirkf commented 1 year ago

See https://github.com/ytdl-org/youtube-dl/issues/31441#issuecomment-1365613198.