ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.2k stars 9.93k forks source link

maoritelevision.com (appears to use brightcove) #24552

Open rsfinlayson opened 4 years ago

rsfinlayson commented 4 years ago

Checklist

Example URLs

Description

A username+password is needed; the following will work: username=rf-mtv-test@live555.com,password=testtest

mitchelltornquist commented 4 months ago

Hi,

The site maoritelevision.com has been updated to maoriplus.co.nz. However URL schemes appear to be the same.

I was able to use the help here: https://github.com/ytdl-org/youtube-dl/commit/4fb25ff5a3be5206bb72e5c4046715b1529fb2c7

And use the same brightcove URL with the Maoriplus ID at the end. Hopefully an easy fix.

Thanks!

dirkf commented 4 months ago

Generally, please open a new issue rather than necroposting in a closed issue. But as you suggest, this may be an easy fix.

Show URLs from the old domain redirect to https://www.maoriplus.co.nz/ (sample of 1), so we can bin the processing for those URLs.

This current show https://www.maoriplus.co.nz/show/code-the-reunion/play/6349856999112 plays in the browser from the UK with no account/cookies.

So now the extractor has the show's Brightcove ID in the URL that previously had to be extracted from the page, very simple:

...
-    _VALID_URL = r'https?://(?:www\.)?maoritelevision\.com/shows/(?:[^/]+/)+(?P<id>[^/?&#]+)'
+    _VALID_URL = r'https?://(?:www\.)?maoriplus\.co\.nz/show/(?P<series>[\w-]+)/play/(?P<id>[\d]+)'
...
     def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-        brightcove_id = self._search_regex(
-            r'data-main-video-id=["\'](\d+)', webpage, 'brightcove id')
+        brightcove_id = self._match_id(url)
         return self.url_result(
             self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
             'BrightcoveNew', brightcove_id)

Then:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-vF', u'https://www.maoriplus.co.nz/show/code-the-reunion/play/6349856999112']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: f66e450bf
[debug] Python 2.7.15 (CPython i686 32bit) - Linux-6.1.0-18-686-pae-i686-with-debian-12.5 - OpenSSL 1.1.1a  20 Nov 2018 - glibc 2.1.3
[debug] exe versions: ffmpeg 5.1.4-0, ffprobe 5.1.4-0
[debug] Proxy map: {}
[brightcove:new] 6349856999112: Downloading JSON metadata
[brightcove:new] 6349856999112: Downloading m3u8 information
[brightcove:new] 6349856999112: Downloading m3u8 information
[brightcove:new] 6349856999112: Downloading MPD manifest
[brightcove:new] 6349856999112: Downloading MPD manifest
[brightcove:new] 6349856999112: Downloading MPD manifest
[brightcove:new] 6349856999112: Downloading MPD manifest
[info] Available formats for 6349856999112:
format code                                  extension  resolution note
hls-audio-0-en__Main_-0                      mp4        audio only [en] 
hls-audio-0-en__Main_-1                      mp4        audio only [en] 
dash-a41239e7-db53-4f4d-924c-54da159ba123-0  m4a        audio only [en] DASH audio   64k , m4a_dash container, mp4a.40.2 (48000Hz)
dash-a41239e7-db53-4f4d-924c-54da159ba123-1  m4a        audio only [en] DASH audio   64k , m4a_dash container, mp4a.40.2 (48000Hz)
dash-a41239e7-db53-4f4d-924c-54da159ba123-2  m4a        audio only [en] DASH audio   64k , m4a_dash container, mp4a.40.2 (48000Hz)
dash-a41239e7-db53-4f4d-924c-54da159ba123-3  m4a        audio only [en] DASH audio   64k , m4a_dash container, mp4a.40.2 (48000Hz)
dash-fe47bcc9-7840-47af-9ea6-727c9bcb8fa4-0  mp4        640x360    DASH video  699k , mp4_dash container, avc1.42001e, video only
dash-fe47bcc9-7840-47af-9ea6-727c9bcb8fa4-1  mp4        640x360    DASH video  699k , mp4_dash container, avc1.42001e, video only
dash-fe47bcc9-7840-47af-9ea6-727c9bcb8fa4-2  mp4        640x360    DASH video  699k , mp4_dash container, avc1.42001e, video only
dash-fe47bcc9-7840-47af-9ea6-727c9bcb8fa4-3  mp4        640x360    DASH video  699k , mp4_dash container, avc1.42001e, video only
hls-839-0                                    mp4        640x360     839k , avc1.42001e, video only
hls-839-1                                    mp4        640x360     839k , avc1.42001e, video only
dash-274d6d64-f208-4760-a7d2-b1cf121016ab-0  mp4        960x540    DASH video 1199k , mp4_dash container, avc1.4d001f, video only
dash-274d6d64-f208-4760-a7d2-b1cf121016ab-1  mp4        960x540    DASH video 1199k , mp4_dash container, avc1.4d001f, video only
dash-274d6d64-f208-4760-a7d2-b1cf121016ab-2  mp4        960x540    DASH video 1199k , mp4_dash container, avc1.4d001f, video only
dash-274d6d64-f208-4760-a7d2-b1cf121016ab-3  mp4        960x540    DASH video 1199k , mp4_dash container, avc1.4d001f, video only
hls-1389-0                                   mp4        960x540    1389k , avc1.4d001f, video only
hls-1389-1                                   mp4        960x540    1389k , avc1.4d001f, video only
dash-01a7d27c-dd8c-48d6-a16f-5be2420bfa34-0  mp4        1280x720   DASH video 1995k , mp4_dash container, avc1.4d001f, video only
dash-01a7d27c-dd8c-48d6-a16f-5be2420bfa34-1  mp4        1280x720   DASH video 1995k , mp4_dash container, avc1.4d001f, video only
dash-01a7d27c-dd8c-48d6-a16f-5be2420bfa34-2  mp4        1280x720   DASH video 1995k , mp4_dash container, avc1.4d001f, video only
dash-01a7d27c-dd8c-48d6-a16f-5be2420bfa34-3  mp4        1280x720   DASH video 1995k , mp4_dash container, avc1.4d001f, video only
hls-2264-0                                   mp4        1280x720   2264k , avc1.4d001f, video only
hls-2264-1                                   mp4        1280x720   2264k , avc1.4d001f, video only
dash-7e4ac127-f037-4a0b-a1cd-0cebb729a65b-0  mp4        1920x1080  DASH video 3496k , mp4_dash container, avc1.640028, video only
dash-7e4ac127-f037-4a0b-a1cd-0cebb729a65b-1  mp4        1920x1080  DASH video 3496k , mp4_dash container, avc1.640028, video only
dash-7e4ac127-f037-4a0b-a1cd-0cebb729a65b-2  mp4        1920x1080  DASH video 3496k , mp4_dash container, avc1.640028, video only
dash-7e4ac127-f037-4a0b-a1cd-0cebb729a65b-3  mp4        1920x1080  DASH video 3496k , mp4_dash container, avc1.640028, video only
hls-3916-0                                   mp4        1920x1080  3916k , avc1.640028, video only
hls-3916-1                                   mp4        1920x1080  3916k , avc1.640028, video only
http-3687k-1080p-0                           mp4        1920x1080  3687k , MP4 container, H264, 1.35GiB
http-3687k-1080p-1                           mp4        1920x1080  3687k , MP4 container, H264, 1.35GiB (best)
$

The show page as seen by yt-dl (with no JS) is essentially empty, so it's lucky that the ID is handed to us. With JS enabled in the browser, the page actually fetches JSON containing the Brightcove data (along with some data for other video hosts), but it happens to be the same as hard-coded in the old extractor's BRIGHTCOVE_URL_TEMPLATE. Maybe there are other shows that will need this JSON to be loaded and parsed.

There is also some schema of seasons/episodes/playlists, etc, that someone who cared could implement.