ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.47k stars 9.96k forks source link

BFM business video extraction fails #32608

Open theedge456 opened 11 months ago

theedge456 commented 11 months ago

Checklist

Verbose log

[debug] Command-line config: ['https://www.bfmtv.com/economie/replay-emissions/les-experts/les-experts-transition-verte-qui-doit-payer-13-10_VN-202310130321.html', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2023.10.13 [b634ba742] (zip)
[debug] Python 3.11.2 (CPython x86_64 64bit) - Linux-6.1.57-x86_64-with-glibc2.36 (OpenSSL 3.0.11 19 Sep 2023, glibc 2.36)
[debug] exe versions: ffmpeg 5.1.3-1 (setts), ffprobe 5.1.3-1
[debug] Optional libraries: Cryptodome-3.11.0, brotli-1.0.9, certifi-2022.09.24, mutagen-1.46.0, pyxattr-0.8.1, requests-2.28.1, sqlite3-3.40.1, urllib3-1.26.12
[debug] Proxy map: {}
[debug] Request Handlers: urllib
[debug] Loaded 1890 extractors
[bfmtv] Extracting URL: https://www.bfmtv.com/economie/replay-emissions/les-experts/les-experts-transition-verte-qui-doit-payer-13-10_VN-202310130321.html
[bfmtv] 202310130321: Downloading webpage
ERROR: [bfmtv] 202310130321: Unable to extract video block; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/usr/bin/youtube-dl/yt_dlp/extractor/common.py", line 715, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/bin/youtube-dl/yt_dlp/extractor/bfmtv.py", line 43, in _real_extract
    video_block = extract_attributes(self._search_regex(
                                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/bin/youtube-dl/yt_dlp/extractor/common.py", line 1263, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)

Description

The extraction fails since October 17th, 2023

dirkf commented 11 months ago

The pattern used by the extractor to find the Brightcove video id is too restrictive:

--- old/youtube-dl/youtube_dl/extractor/bfmtv.py
+++ new/youtube-dl/youtube_dl/extractor/bfmtv.py
@@ -10,7 +10,7 @@
 class BFMTVBaseIE(InfoExtractor):
     _VALID_URL_BASE = r'https?://(?:www\.)?bfmtv\.com/'
     _VALID_URL_TMPL = _VALID_URL_BASE + r'(?:[^/]+/)*[^/?&#]+_%s[A-Z]-(?P<id>\d{12})\.html'
-    _VIDEO_BLOCK_REGEX = r'(<div[^>]+class="video_block"[^>]*>)'
+    _VIDEO_BLOCK_REGEX = r'(<div\s[^>]*\bclass\s*=\s*["\'](?:[\S]\s+)*video_block\b[^>]+>)'
     BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'

     def _brightcove_url_result(self, video_id, video_block):

Also there are some improvements from the yt-dlp version to add.

theedge456 commented 11 months ago

@dirkf It's fixed now. Thanks for the patch

dirkf commented 11 months ago

Thanks. I'll keep it open until the patch is committed.

mycodedoesnotcompile2 commented 10 months ago

When this patch will be merged ?