ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
129.76k stars 9.78k forks source link

mewatch unable to download video; Unable to download JSON metadata #32043

Open TechvitalCompitar opened 1 year ago

TechvitalCompitar commented 1 year ago

Checklist

Verbose log

PASTE VERBOSE LOG HERE

Description

WRITE DESCRIPTION HERE

Hi, I am unable to download any video from mewatch recently, i have been receiving this error while trying to download the videos. image

dirkf commented 1 year ago

I checked this earlier report: https://github.com/yt-dlp/yt-dlp/issues/6718

The site is obviously not working in the way it did before.

The failure is on access to this API URL: http://tvpapi.as.tvinci.com/v2_9/gateways/jsonpostgw.aspx?m=GetMediaInfo

Either this API no longer works, or there is a newer version (.../v3_0/... ?). This would have to be identified by tracing browser requests, or by reverse engineering the site JS, or from secret knowledge. This might have to be done in-region.

However there is an outstanding PR #25898 which does update the API version to v3_9. Please try that.

CC: @hueyy (PR author)

dirkf commented 1 year ago

Apparently the original video host tvinci.com was acquired by Kaltura; however the transitional API URL added in the PR is also 404 now.

Probably the site is using the Kaltura hosting directly. For other sites that use Kaltura we can form the pseudo-URL kaltura:{partner_id}:{entry_id}, where partner_id is linked to the site and entry_id identifies the media item.

As OP's example is a super-long URL in an image I won't be bothering to test it (see manual: BUGS). The page in the yt-dlp issue gives this

Using yt-dl on kaltura:2082301:1_g9ihx6sz: An extractor error has occurred. (caused by KeyError(u'dataUrl',))

With yt-dlp: Kaltura said: Invalid entry id ["1_g9ihx6sz"]

Perhaps this has expired?

october262 commented 1 year ago

this episode works for me - https://www.mewatch.sg/watch/Titoudao-Inspired-by-the-True-Story-of-a-Wayang-Star-E1-Matchstick-Girl-136853 but you need to use a browser add-on called the stream detector to download the episode - yt-dlp --no-part --restrict-filenames -N 4 --user-agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36" --referer "https://www.mewatch.sg/" "https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/0_ie93g2ql/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=7ac32418-35f1-35a0-233a-e2a25f47ab7f:b9995535-1445-34c0-88bc-87760b867929"

tried the awards ceremony but it is region locked.

dirkf commented 1 year ago

The partner ID seems to have changed since the page from the yt-dlp issue was created. Now 2082311, was 2082301.

The modified pseudo-URL kaltura:2082311:1_g9ihx6sz works in both yt-dl and yt-dlp. -f worst gives http://cdnapi.kaltura.com/p/2082311/sp/208231100/playManifest/entryId/1_g9ihx6sz/format/url/protocol/http/flavorId/1_d730hmji which is gettable from UK.

Similarly kaltura:2082311:0_ie93g2ql is playable from yt-dlp with -f worst -o - | mpv -.

dirkf commented 1 year ago

This MeWatchIE._real_extract() seems to work:

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(
            url, video_id, note='Downloading video page')
        page_data = self._search_regex(
            r'(?s)window\s*\.\s*__data\s*=\s*(\{.*?\})\s*</script>',
            webpage, 'hydration JSON')
        page_data = self._parse_json(page_data, video_id)
        partner_id = traverse_obj(
            page_data, ('app', 'config', 'playback', 'kalturaThumbnailBaseUrl'),
            expected_type=lambda u: (url_or_none(u) or '').rstrip('/').rpartition('/')[2] or 2082311)
        show_data = traverse_obj(page_data,
                                 ('cache', 'page', Ellipsis, 'entries',
                                  lambda _, v: v['item']['id'] == video_id),
                                 get_all=False)

        entry_id = traverse_obj(show_data, ('item', 'customFields', 'EntryId'))

        txt_or_none = lambda x: x.strip() or None

        return merge_dicts(
            {'_type': 'url_transparent'},
            self.url_result(
                'kaltura:%s:%s' % (partner_id, entry_id), ie='Kaltura', video_id=video_id),
            {
                'title': traverse_obj(show_data, 'title', ('item', ('title', ('customFields', 'sortTitle'), 'episodeName')),
                                      get_all=False, expected_type=txt_or_none) or self._generic_title(url),
                'description': traverse_obj(show_data, ('item', ('description', 'shortDescription')), get_all=False, expected_type=txt_or_none),
                'uploader': traverse_obj(show_data,('item', 'distributor'), expected_type=txt_or_none),
                'categories': traverse_obj(show_data,('item', 'genres', Ellipsis), expected_type=txt_or_none),
                'episode_number': traverse_obj(show_data,('item', 'episodeNumber'), expected_type=int_or_none),
                'episode': traverse_obj(show_data, ('item', 'episodeName'), expected_type=lambda x: re.sub(r'^Ep\s+\d+\s+(.*?)\s*$', r'\1', x) or None),
                'season_id': traverse_obj(show_data, ('item', 'seasonId'), expected_type=txt_or_none),
                'series_id': traverse_obj(show_data, ('item', ('showId', ('season', 'show', 'id'))), get_all=False, expected_type=txt_or_none),
                'season_number': traverse_obj(show_data,('item', 'season', 'seasonNumber'), expected_type=int_or_none),
                'season': traverse_obj(show_data, ('item', 'season', 'title'), expected_type=txt_or_none),
                'series': traverse_obj(show_data, ('item', 'season', 'show', 'title'), expected_type=txt_or_none),
            })

I didn't investigate how ToggleIE should be updated. The first test fails in the same way as the MeWatch pages.

dirkf commented 1 year ago

Apparently the ToggleIE tests are all expired. New examples welcome.

benjaminyam commented 1 year ago

Tests: https://www.mewatch.sg/watch/CNA-Correspondent-2023-2024-E1-The-Global-Tech-Slowdown-367707 https://www.mewatch.sg/watch/CNA-Correspondent-2023-2024-E2-The-Hand-That-Rocks-The-Cradle-369430 https://www.mewatch.sg/watch/Money-Mind-2023-2024-E2-Investing-In-AI-369003 https://www.mewatch.sg/watch/8-Days-Interviews-E51-Jason-Bateman-and-Chris-Messina-on-sneakers-the-Matt-Damon-Ben-Affleck-bond-in-Air-367654 https://www.mewatch.sg/watch/8-Days-Interviews-E52-Renfield-s-Nicolas-Cage-is-still-traumatised-by-Vampire-s-Kiss-s-roach-eating-scene-369493 https://www.mewatch.sg/watch/The-Stewards-Of-Intangible-Cultural-Heritage-Award-2020-E1-Asparas-Arts-Ltd-369462 https://www.mewatch.sg/watch/CNA-Lifestyle-2023-E31-A-slice-of-haute-couture-through-Chanel-s-eyes-369946 https://www.mewatch.sg/watch/Apr-2023-CH-5-News-Tonight-367550 https://www.mewatch.sg/watch/Apr-2023-CH-5-News-Tonight-369872 https://www.mewatch.sg/watch/Apr-2023-CH-5-News-Tonight-369968 https://www.mewatch.sg/watch/Talking-Point-2023-2024-E1-Is-There-Such-A-Thing-As-Guilt-Free-Ice-Cream-367862 https://www.mewatch.sg/watch/Talking-Point-2023-2024-E2-Who-Knows-Where-You-Live-369824

benjaminyam commented 1 year ago

I just realized what is being asked for ToggleIE and the url format no longer exists. All of them are in the MeWatchIE format now. So we can just go with MeWatchIE and not bother with ToggleIE format

zengjiawei98 commented 1 year ago

I am not sure how I can help but I hope my info can help in some way.

Say I wanted to download this video link, https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951

It creates a .mpd file where it shows that files are hosted on cloudfront.net

And with the .mpd file, I could play the stream via VLC.

And these are the content of the .mpd file `<?xml version="1.0"?> <MPD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:mpeg:dash:schema:mpd:2011" xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd" type="static" mediaPresentationDuration="PT2727.080S" minBufferTime="PT4S" profiles="urn:mpeg:dash:profile:isoff-main:2011">

`

dirkf commented 1 year ago

For URLs like that we know what to do, but it doesn't obviously involve DASH:

$ python -m youtube_dl -v -F 'https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: d7b502a72
[debug] Python version 2.7.18 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[mewatch] Extracting URL: https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951
[mewatch] 368951: Downloading video page
[Kaltura] Extracting URL: kaltura:2082311:1_4n8bmm4x
[Kaltura] 1_4n8bmm4x: Downloading video info JSON
[Kaltura] 1_4n8bmm4x: Downloading m3u8 information
[info] Available formats for 1_4n8bmm4x:
format code        extension  resolution note
hls-audio-Chinese  mp4        audio only [zh] Chinese 
mp4-65             mp4        audio only   65k , isom container, 0fps, audio@ 65k, ~21.26MiB
mp4-195            mp4        320x180     195k , isom container, avc1@ 195k, 25fps, audio@  0k, ~63.60MiB
hls-222            mp4        320x180     222k video@ 222k, audio@  0k
mp4-472            mp4        480x270     472k , isom container, avc1@ 472k, 25fps, audio@  0k, ~153.73MiB
hls-512            mp4        480x270     512k video@ 512k, audio@  0k
mp4-789            mp4        640x360     789k , isom container, avc1@ 789k, 25fps, audio@  0k, ~256.64MiB
hls-844            mp4        640x360     844k video@ 844k, audio@  0k
mp4-1399           mp4        854x480    1399k , isom container, avc1@1399k, 25fps, audio@  0k, ~455.00MiB
hls-1482           mp4        854x480    1482k video@1482k, audio@  0k
mp4-1917           mp4        960x540    1917k , isom container, avc1@1917k, 25fps, audio@  0k, ~623.53MiB
hls-2024           mp4        960x540    2024k video@2024k, audio@  0k
mp4-2572           mp4        1280x720   2572k , isom container, avc1@2572k, 25fps, audio@  0k, ~836.39MiB
mp4-4084           mp4        1920x1080  4084k , isom container, avc1@4084k, 25fps, audio@  0k, ~1.30GiB (best)
$
benjaminyam commented 1 year ago

@dirkf: Can you check if the downloaded files are playable? I was able to "download" using your mewatch _real_extract code, but the output file was not playable in VLC.

dirkf commented 1 year ago

python -m youtube_dl -v -f worst -o - 'https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951' | mpv - gives a grey screen and lots of decode errors from mpv for me. I assume it's encrypted and we ought to find out how that can be detected from the metadata.

Same for 'https://www.mewatch.sg/watch/Titoudao-Inspired-by-the-True-Story-of-a-Wayang-Star-E1-Matchstick-Girl-136853'.

Maybe all shows are "protected"? This needs to be tested in-region using a browser with DRM disabled: how?.

benjaminyam commented 1 year ago

this episode works for me - https://www.mewatch.sg/watch/Titoudao-Inspired-by-the-True-Story-of-a-Wayang-Star-E1-Matchstick-Girl-136853 but you need to use a browser add-on called the stream detector to download the episode - yt-dlp --no-part --restrict-filenames -N 4 --user-agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36" --referer "https://www.mewatch.sg/" "https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/0_ie93g2ql/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=7ac32418-35f1-35a0-233a-e2a25f47ab7f:b9995535-1445-34c0-88bc-87760b867929"

tried the awards ceremony but it is region locked.

@dirkf: The yt-dlp command provided by @october262 works for me in the correct region, but if I run the https://www.mewatch.sg/watch/Titoudao-Inspired-by-the-True-Story-of-a-Wayang-Star-E1-Matchstick-Girl-136853 through yt-dlp as an input using your real extract, it has the grey screen like you mention

dirkf commented 1 year ago

The HLS formats seem to work, but the Kaltura extractor doesn't know about DASH. Using a similar tricktechnique to that used for HLS gets some DASH formats but they give 400.

Passing the original URL through to Kaltura (kaltura_url = smuggle_url(kaltura_url, {'source_url': url})) seems like a good idea but the plain formats (eg worst) are even less playable "Failed to recognize file format." ffprobe produces errors and then identifies H.264+AAC(LC) video.

Maybe some browser tracing would show how the playSession query parameter is being generated. The other parts of the quoted DASH URL all seem to be accessible.

It might also be useful to know what @zengjiawei98's MPD URL was.

chlee00 commented 1 year ago

3 urls are capture from hls stream detector

https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/manifest.mpd

https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm4x/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=189b24a0-13c2-cb94-0f18-d61f8161d74b:c87f8895-e1f8-045c-be8d-c4885f66bef4

https://rest-as.ott.kaltura.com/api_v3/service/assetFile/action/playManifest/partnerId/147/assetId/1461060/assetType/media/assetFileId/18641451/contextType/PLAYBACK/isAltUrl/False/ks/djJ8MTQ3fK4vKULHMeo0NLLwFzN8mlZbK3sx9_NBc5rflsZ5VulcejRvAfmFnqR53pswqosPRVNF1rV6nq2H6deDViKKkwd9B-SrukEEEKxByUVIga__QcytKI5F9yhx_jFXX2pBDzyXr4011Rs-93khQN18wqFlStF9d-7ADZ7vL3odzHNnAa9xPSyMQX7pw39GivAYhKgj1LDDmt-8EgoQVcB5GxcFiq0Nt46plYInJEMWlitVXZQAwLZWo7wCXjuXIjBPHql6zIEIFleeHnFheB1dZOfz2FvbBOuc89s7f_1bsOQm-t_xIiWipOxXgvs14_2f587EcwdoU_CtpcOf4ccyI1MLQFKpgV5dEIKNOI9zzflZBq05-GTGriQoNJLTP9JyIq7DvaTZdfB3MXbSa6iAb52XyG3A4kOem6mKmshKhCLuVrI9FyTan_juCiVJAJG9LQ67MtfxKbyuYre7fRcq_p-C6Lo8Td_9YqlZdG2TSULV2PBjcZgNHjSYu86PqWTkUXM7RduoNtj1VicyY_2J2OTfQrs0d9Z0tTpDLWZHJxmu/a.mpd?playSessionId=189b24a0-13c2-cb94-0f18-d61f8161d74b:c87f8895-e1f8-045c-be8d-c4885f66bef4&referrer=aHR0cHM6Ly93d3cubWV3YXRjaC5zZy93YXRjaC9UaGUtU3Rhci1BdGhsZXRlLUU5LUZsb29yYmFsbC0zNjg5NTE=&clientTag=html5:v2.0.0

================ using the 2nd link, the episode can be download. the stream is playable and is 1080p.

yt-dlp -o "S01.E09.mp4" -uV --no-part --restrict-filenames -N 4 --user-agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36" --referer "https://www.mewatch.sg/"  --merge-output-format mp4 --ffmpeg-location ffmpeg\bin "https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm4x/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=58dd2045-ee1e-5ac8-0784-8d6009fb3144:f895f2d5-d010-31d7-e8af-3b23ba901857"
Type account password and press [Return]:
[generic] Extracting URL: https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm...d7-e8af-3b23ba901857
[generic] a.mpd?clientTag=html5:v2.0: Downloading webpage
[redirect] Following redirect to https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/manifest.mpd
[generic] Extracting URL: https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/....urlset/manifest.mpd
[generic] manifest: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] manifest: Extracting information
[info] manifest: Downloading 1 format(s): f5-v1-x3+f4-a1-x3
[dashsegments] Total fragments: 683
[download] Destination: S01.E09.ff5-v1-x3.mp4
WARNING: The download speed shown is only of one thread. This is a known issue
[download] 100% of    1.25GiB in 00:01:38 at 13.02MiB/s
[dashsegments] Total fragments: 683
[download] Destination: S01.E09.ff4-a1-x3.m4a
WARNING: The download speed shown is only of one thread. This is a known issue
[download] 100% of   41.83MiB in 00:01:20 at 533.70KiB/s
[Merger] Merging formats into "S01.E09.mp4"
Deleting original file S01.E09.ff5-v1-x3.mp4 (pass -k to keep)
Deleting original file S01.E09.ff4-a1-x3.m4a (pass -k to keep)
zengjiawei98 commented 1 year ago

3 urls are capture from hls stream detector

https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/manifest.mpd

https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm4x/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=189b24a0-13c2-cb94-0f18-d61f8161d74b:c87f8895-e1f8-045c-be8d-c4885f66bef4

https://rest-as.ott.kaltura.com/api_v3/service/assetFile/action/playManifest/partnerId/147/assetId/1461060/assetType/media/assetFileId/18641451/contextType/PLAYBACK/isAltUrl/False/ks/djJ8MTQ3fK4vKULHMeo0NLLwFzN8mlZbK3sx9_NBc5rflsZ5VulcejRvAfmFnqR53pswqosPRVNF1rV6nq2H6deDViKKkwd9B-SrukEEEKxByUVIga__QcytKI5F9yhx_jFXX2pBDzyXr4011Rs-93khQN18wqFlStF9d-7ADZ7vL3odzHNnAa9xPSyMQX7pw39GivAYhKgj1LDDmt-8EgoQVcB5GxcFiq0Nt46plYInJEMWlitVXZQAwLZWo7wCXjuXIjBPHql6zIEIFleeHnFheB1dZOfz2FvbBOuc89s7f_1bsOQm-t_xIiWipOxXgvs14_2f587EcwdoU_CtpcOf4ccyI1MLQFKpgV5dEIKNOI9zzflZBq05-GTGriQoNJLTP9JyIq7DvaTZdfB3MXbSa6iAb52XyG3A4kOem6mKmshKhCLuVrI9FyTan_juCiVJAJG9LQ67MtfxKbyuYre7fRcq_p-C6Lo8Td_9YqlZdG2TSULV2PBjcZgNHjSYu86PqWTkUXM7RduoNtj1VicyY_2J2OTfQrs0d9Z0tTpDLWZHJxmu/a.mpd?playSessionId=189b24a0-13c2-cb94-0f18-d61f8161d74b:c87f8895-e1f8-045c-be8d-c4885f66bef4&referrer=aHR0cHM6Ly93d3cubWV3YXRjaC5zZy93YXRjaC9UaGUtU3Rhci1BdGhsZXRlLUU5LUZsb29yYmFsbC0zNjg5NTE=&clientTag=html5:v2.0.0

================ using the 2nd link, the episode can be download. the stream is playable and is 1080p.

yt-dlp -o "S01.E09.mp4" -uV --no-part --restrict-filenames -N 4 --user-agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36" --referer "https://www.mewatch.sg/"  --merge-output-format mp4 --ffmpeg-location ffmpeg\bin "https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm4x/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=58dd2045-ee1e-5ac8-0784-8d6009fb3144:f895f2d5-d010-31d7-e8af-3b23ba901857"
Type account password and press [Return]:
[generic] Extracting URL: https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm...d7-e8af-3b23ba901857
[generic] a.mpd?clientTag=html5:v2.0: Downloading webpage
[redirect] Following redirect to https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/manifest.mpd
[generic] Extracting URL: https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/....urlset/manifest.mpd
[generic] manifest: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] manifest: Extracting information
[info] manifest: Downloading 1 format(s): f5-v1-x3+f4-a1-x3
[dashsegments] Total fragments: 683
[download] Destination: S01.E09.ff5-v1-x3.mp4
WARNING: The download speed shown is only of one thread. This is a known issue
[download] 100% of    1.25GiB in 00:01:38 at 13.02MiB/s
[dashsegments] Total fragments: 683
[download] Destination: S01.E09.ff4-a1-x3.m4a
WARNING: The download speed shown is only of one thread. This is a known issue
[download] 100% of   41.83MiB in 00:01:20 at 533.70KiB/s
[Merger] Merging formats into "S01.E09.mp4"
Deleting original file S01.E09.ff5-v1-x3.mp4 (pass -k to keep)
Deleting original file S01.E09.ff4-a1-x3.m4a (pass -k to keep)

Thanks for this info! It works but these are downloading fragments instead of the original files. It works for now and requires a bit more work. But better than nothing of course! Also, all 3 .mpd(s) generated links to the same library and are able to download.

humanitiesclinic commented 1 year ago

Is there anyone working on updating youtube-dl itself to solve this issue, so a smooth download directly with the YouTube-dl command is possible (rather than just a workaround)? I am in Singapore, I can access the site urls and without location restriction.. is there anything I can help with? (I am not fully familiar with the source code though…)

benjaminyam commented 10 months ago

This MeWatchIE._real_extract() seems to work:

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(
            url, video_id, note='Downloading video page')
        page_data = self._search_regex(
            r'(?s)window\s*\.\s*__data\s*=\s*(\{.*?\})\s*</script>',
            webpage, 'hydration JSON')
        page_data = self._parse_json(page_data, video_id)
        partner_id = traverse_obj(
            page_data, ('app', 'config', 'playback', 'kalturaThumbnailBaseUrl'),
            expected_type=lambda u: (url_or_none(u) or '').rstrip('/').rpartition('/')[2] or 2082311)
        show_data = traverse_obj(page_data,
                                 ('cache', 'page', Ellipsis, 'entries',
                                  lambda _, v: v['item']['id'] == video_id),
                                 get_all=False)

        entry_id = traverse_obj(show_data, ('item', 'customFields', 'EntryId'))

        txt_or_none = lambda x: x.strip() or None

        return merge_dicts(
            {'_type': 'url_transparent'},
            self.url_result(
                'kaltura:%s:%s' % (partner_id, entry_id), ie='Kaltura', video_id=video_id),
            {
                'title': traverse_obj(show_data, 'title', ('item', ('title', ('customFields', 'sortTitle'), 'episodeName')),
                                      get_all=False, expected_type=txt_or_none) or self._generic_title(url),
                'description': traverse_obj(show_data, ('item', ('description', 'shortDescription')), get_all=False, expected_type=txt_or_none),
                'uploader': traverse_obj(show_data,('item', 'distributor'), expected_type=txt_or_none),
                'categories': traverse_obj(show_data,('item', 'genres', Ellipsis), expected_type=txt_or_none),
                'episode_number': traverse_obj(show_data,('item', 'episodeNumber'), expected_type=int_or_none),
                'episode': traverse_obj(show_data, ('item', 'episodeName'), expected_type=lambda x: re.sub(r'^Ep\s+\d+\s+(.*?)\s*$', r'\1', x) or None),
                'season_id': traverse_obj(show_data, ('item', 'seasonId'), expected_type=txt_or_none),
                'series_id': traverse_obj(show_data, ('item', ('showId', ('season', 'show', 'id'))), get_all=False, expected_type=txt_or_none),
                'season_number': traverse_obj(show_data,('item', 'season', 'seasonNumber'), expected_type=int_or_none),
                'season': traverse_obj(show_data, ('item', 'season', 'title'), expected_type=txt_or_none),
                'series': traverse_obj(show_data, ('item', 'season', 'show', 'title'), expected_type=txt_or_none),
            })

I didn't investigate how ToggleIE should be updated. The first test fails in the same way as the MeWatch pages.

Added ToggleIE back in as seems like it is being used by www.channelnewsasia.com This piece of code works for me. Can someone help commit and merge?

toggle.py

import json
import re

from .common import InfoExtractor
from ..utils import (
    determine_ext,
    float_or_none,
    int_or_none,
    parse_iso8601,
    strip_or_none,
    url_or_none,
    traverse_obj,
    merge_dicts,
)

class ToggleIE(InfoExtractor):
    IE_NAME = 'toggle'
    _VALID_URL = r'(?:https?://(?:(?:www\.)?mewatch|video\.toggle)\.sg/(?:en|zh)/(?:[^/]+/){2,}|toggle:)(?P<id>[0-9]+)'
    _TESTS = [{
        'url': 'http://www.mewatch.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
        'info_dict': {
            'id': '343115',
            'ext': 'mp4',
            'title': 'Lion Moms Premiere',
            'description': 'md5:aea1149404bff4d7f7b6da11fafd8e6b',
            'upload_date': '20150910',
            'timestamp': 1441858274,
        },
        'params': {
            'skip_download': 'm3u8 download',
        }
    }, {
        'note': 'DRM-protected video',
        'url': 'http://www.mewatch.sg/en/movies/dug-s-special-mission/341413',
        'info_dict': {
            'id': '341413',
            'ext': 'wvm',
            'title': 'Dug\'s Special Mission',
            'description': 'md5:e86c6f4458214905c1772398fabc93e0',
            'upload_date': '20150827',
            'timestamp': 1440644006,
        },
        'params': {
            'skip_download': 'DRM-protected wvm download',
        }
    }, {
        # this also tests correct video id extraction
        'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay',
        'url': 'http://www.mewatch.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
        'info_dict': {
            'id': '332861',
            'ext': 'mp4',
            'title': '28th SEA Games (5 Show) -  Episode  11',
            'description': 'md5:3cd4f5f56c7c3b1340c50a863f896faa',
            'upload_date': '20150605',
            'timestamp': 1433480166,
        },
        'params': {
            'skip_download': 'DRM-protected wvm download',
        },
        'skip': 'm3u8 links are geo-restricted'
    }, {
        'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
        'only_matching': True,
    }, {
        'url': 'http://www.mewatch.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
        'only_matching': True,
    }, {
        'url': 'http://www.mewatch.sg/zh/series/zero-calling-s2-hd/ep13/336367',
        'only_matching': True,
    }, {
        'url': 'http://www.mewatch.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
        'only_matching': True,
    }, {
        'url': 'http://www.mewatch.sg/en/movies/seven-days/321936',
        'only_matching': True,
    }, {
        'url': 'https://www.mewatch.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456',
        'only_matching': True,
    }, {
        'url': 'http://www.mewatch.sg/en/channels/eleven-plus/401585',
        'only_matching': True,
    }]

    _API_USER = 'tvpapi_147'
    _API_PASS = '11111'

    def _real_extract(self, url):
        video_id = self._match_id(url)

        params = {
            'initObj': {
                'Locale': {
                    'LocaleLanguage': '',
                    'LocaleCountry': '',
                    'LocaleDevice': '',
                    'LocaleUserState': 0
                },
                'Platform': 0,
                'SiteGuid': 0,
                'DomainID': '0',
                'UDID': '',
                'ApiUser': self._API_USER,
                'ApiPass': self._API_PASS
            },
            'MediaID': video_id,
            'mediaType': 0,
        }

        info = self._download_json(
            'http://tvpapi.as.tvinci.com/v2_9/gateways/jsonpostgw.aspx?m=GetMediaInfo',
            video_id, 'Downloading video info json', data=json.dumps(params).encode('utf-8'))

        title = info['MediaName']

        formats = []
        for video_file in info.get('Files', []):
            video_url, vid_format = video_file.get('URL'), video_file.get('Format')
            if not video_url or video_url == 'NA' or not vid_format:
                continue
            ext = determine_ext(video_url)
            vid_format = vid_format.replace(' ', '')
            # if geo-restricted, m3u8 is inaccessible, but mp4 is okay
            if ext == 'm3u8':
                m3u8_formats = self._extract_m3u8_formats(
                    video_url, video_id, ext='mp4', m3u8_id=vid_format,
                    note='Downloading %s m3u8 information' % vid_format,
                    errnote='Failed to download %s m3u8 information' % vid_format,
                    fatal=False)
                for f in m3u8_formats:
                    # Apple FairPlay Streaming
                    if '/fpshls/' in f['url']:
                        continue
                    formats.append(f)
            elif ext == 'mpd':
                formats.extend(self._extract_mpd_formats(
                    video_url, video_id, mpd_id=vid_format,
                    note='Downloading %s MPD manifest' % vid_format,
                    errnote='Failed to download %s MPD manifest' % vid_format,
                    fatal=False))
            elif ext == 'ism':
                formats.extend(self._extract_ism_formats(
                    video_url, video_id, ism_id=vid_format,
                    note='Downloading %s ISM manifest' % vid_format,
                    errnote='Failed to download %s ISM manifest' % vid_format,
                    fatal=False))
            elif ext == 'mp4':
                formats.append({
                    'ext': ext,
                    'url': video_url,
                    'format_id': vid_format,
                })
        if not formats:
            for meta in (info.get('Metas') or []):
                if (not self.get_param('allow_unplayable_formats')
                        and meta.get('Key') == 'Encryption' and meta.get('Value') == '1'):
                    self.report_drm(video_id)
            # Most likely because geo-blocked if no formats and no DRM

        thumbnails = []
        for picture in info.get('Pictures', []):
            if not isinstance(picture, dict):
                continue
            pic_url = picture.get('URL')
            if not pic_url:
                continue
            thumbnail = {
                'url': pic_url,
            }
            pic_size = picture.get('PicSize', '')
            m = re.search(r'(?P<width>\d+)[xX](?P<height>\d+)', pic_size)
            if m:
                thumbnail.update({
                    'width': int(m.group('width')),
                    'height': int(m.group('height')),
                })
            thumbnails.append(thumbnail)

        def counter(prefix):
            return int_or_none(
                info.get(prefix + 'Counter') or info.get(prefix.lower() + '_counter'))

        return {
            'id': video_id,
            'title': title,
            'description': strip_or_none(info.get('Description')),
            'duration': int_or_none(info.get('Duration')),
            'timestamp': parse_iso8601(info.get('CreationDate') or None),
            'average_rating': float_or_none(info.get('Rating')),
            'view_count': counter('View'),
            'like_count': counter('Like'),
            'thumbnails': thumbnails,
            'formats': formats,
        }

class MeWatchIE(InfoExtractor):
    IE_NAME = 'mewatch'
    _VALID_URL = r'https?://(?:(?:www|live)\.)?mewatch\.sg/watch/[^/?#&]+-(?P<id>[0-9]+)'
    _TESTS = [{
        'url': 'https://www.mewatch.sg/watch/Recipe-Of-Life-E1-179371',
        'info_dict': {
            'id': '1008625',
            'ext': 'mp4',
            'title': 'Recipe Of Life 味之道',
            'timestamp': 1603306526,
            'description': 'md5:6e88cde8af2068444fc8e1bc3ebf257c',
            'upload_date': '20201021',
        },
        'params': {
            'skip_download': 'm3u8 download',
        },
    }, {
        'url': 'https://www.mewatch.sg/watch/Little-Red-Dot-Detectives-S2-搜密。打卡。小红点-S2-E1-176232',
        'only_matching': True,
    }, {
        'url': 'https://www.mewatch.sg/watch/Little-Red-Dot-Detectives-S2-%E6%90%9C%E5%AF%86%E3%80%82%E6%89%93%E5%8D%A1%E3%80%82%E5%B0%8F%E7%BA%A2%E7%82%B9-S2-E1-176232',
        'only_matching': True,
    }, {
        'url': 'https://live.mewatch.sg/watch/Recipe-Of-Life-E41-189759',
        'only_matching': True,
    }]

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(
            url, video_id, note='Downloading video page')
        page_data = self._search_regex(
            r'(?s)window\s*\.\s*__data\s*=\s*(\{.*?\})\s*</script>',
            webpage, 'hydration JSON')
        page_data = self._parse_json(page_data, video_id)
        partner_id = traverse_obj(
            page_data, ('app', 'config', 'playback', 'kalturaThumbnailBaseUrl'),
            expected_type=lambda u: (url_or_none(u) or '').rstrip('/').rpartition('/')[2] or 2082311)
        show_data = traverse_obj(page_data,
                                 ('cache', 'page', Ellipsis, 'entries',
                                  lambda _, v: v['item']['id'] == video_id),
                                 get_all=False)

        entry_id = traverse_obj(show_data, ('item', 'customFields', 'EntryId'))

        txt_or_none = lambda x: x.strip() or None

        return merge_dicts(
            {'_type': 'url_transparent'},
            self.url_result(
                'kaltura:%s:%s' % (partner_id, entry_id), ie='Kaltura', video_id=video_id),
            {
                'title': traverse_obj(show_data, 'title', ('item', ('title', ('customFields', 'sortTitle'), 'episodeName')),
                                      get_all=False, expected_type=txt_or_none) or self._generic_title(url),
                'description': traverse_obj(show_data, ('item', ('description', 'shortDescription')), get_all=False, expected_type=txt_or_none),
                'uploader': traverse_obj(show_data,('item', 'distributor'), expected_type=txt_or_none),
                'categories': traverse_obj(show_data,('item', 'genres', Ellipsis), expected_type=txt_or_none),
                'episode_number': traverse_obj(show_data,('item', 'episodeNumber'), expected_type=int_or_none),
                'episode': traverse_obj(show_data, ('item', 'episodeName'), expected_type=lambda x: re.sub(r'^Ep\s+\d+\s+(.*?)\s*$', r'\1', x) or None),
                'season_id': traverse_obj(show_data, ('item', 'seasonId'), expected_type=txt_or_none),
                'series_id': traverse_obj(show_data, ('item', ('showId', ('season', 'show', 'id'))), get_all=False, expected_type=txt_or_none),
                'season_number': traverse_obj(show_data,('item', 'season', 'seasonNumber'), expected_type=int_or_none),
                'season': traverse_obj(show_data, ('item', 'season', 'title'), expected_type=txt_or_none),
                'series': traverse_obj(show_data, ('item', 'season', 'show', 'title'), expected_type=txt_or_none),
            })
dirkf commented 10 months ago

See PR #32172.