Site Support Request: Discoveryplus.nl

martijngoorman commented 3 years ago

Checklist

[x] I'm reporting a new site support request
[x] I've verified that I'm running youtube-dl version 2021.06.06
[x] I've checked that all provided URLs are alive and playable in a browser
[x] I've checked that none of provided URLs violate any copyrights
[x] I've searched the bugtracker for similar site support requests including closed ones

Example URLs

Description

Links directly form discoveryplus.nl doesn't work. When provide the m3u8 url to YTDL, then no subs are show:

PS D:\Downloads> .\youtube-dl.exe -i https://dplaysouth-vod.akamaized.net/dplaydni/215991/0/hls/10180679004/playlist.m3u8?hdnts=st=1623261207~exp=1623347607~acl=/dplaydni/215991/0/hls/10180679004/*~hmac=3b40fe65985b45e9c623ac5802c4f10b5d3caec5fc4a303004a7f71fa7fb7168 --list-subs
[generic] *~hmac=3b40fe65985b45e9c623ac5802c4f10b5d3caec5fc4a303004a7f71fa7fb7168: Requesting header
[generic] *~hmac=3b40fe65985b45e9c623ac5802c4f10b5d3caec5fc4a303004a7f71fa7fb7168: Downloading m3u8 information
*~hmac=3b40fe65985b45e9c623ac5802c4f10b5d3caec5fc4a303004a7f71fa7fb7168 has no subtitles

But when looking into the M3U8 :

#EXTM3U
#EXT-X-VERSION:4
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="160000mp4a.40.2",LANGUAGE="eng",NAME="eng",AUTOSELECT=YES,DEFAULT=NO,CHANNELS="2",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/1477160146-prog_index.m3u8?version_hash=569d7c64"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="64000mp4a.40.2",LANGUAGE="eng",NAME="eng",AUTOSELECT=YES,DEFAULT=NO,CHANNELS="2",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/1510100771-prog_index.m3u8?version_hash=569d7c64"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="100wvtt.vtt",LANGUAGE="nl",NAME="nl",AUTOSELECT=YES,DEFAULT=NO,FORCED=NO,URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/1898817342-prog_index.m3u8?version_hash=569d7c64"
#EXT-X-STREAM-INF:BANDWIDTH=1752428,AVERAGE-BANDWIDTH=1752428,RESOLUTION=960x540,FRAME-RATE=25.000,VIDEO-RANGE=SDR,CODECS="avc1.4D401F,mp4a.40.2",AUDIO="160000mp4a.40.2",SUBTITLES="100wvtt.vtt"
exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/520500154-prog_index.m3u8?version_hash=569d7c64
#EXT-X-STREAM-INF:BANDWIDTH=234324,AVERAGE-BANDWIDTH=234324,RESOLUTION=320x180,FRAME-RATE=25.000,VIDEO-RANGE=SDR,CODECS="avc1.42C015,mp4a.40.2",AUDIO="64000mp4a.40.2",SUBTITLES="100wvtt.vtt"
exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/693032198-prog_index.m3u8?version_hash=569d7c64
#EXT-X-STREAM-INF:BANDWIDTH=464308,AVERAGE-BANDWIDTH=464308,RESOLUTION=480x270,FRAME-RATE=25.000,VIDEO-RANGE=SDR,CODECS="avc1.42C01E,mp4a.40.2",AUDIO="64000mp4a.40.2",SUBTITLES="100wvtt.vtt"
exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/629141097-prog_index.m3u8?version_hash=569d7c64
#EXT-X-STREAM-INF:BANDWIDTH=861404,AVERAGE-BANDWIDTH=861404,RESOLUTION=640x360,FRAME-RATE=25.000,VIDEO-RANGE=SDR,CODECS="avc1.4D401F,mp4a.40.2",AUDIO="64000mp4a.40.2",SUBTITLES="100wvtt.vtt"
exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/2095253500-prog_index.m3u8?version_hash=569d7c64
#EXT-X-STREAM-INF:BANDWIDTH=2354572,AVERAGE-BANDWIDTH=2354572,RESOLUTION=1024x576,FRAME-RATE=25.000,VIDEO-RANGE=SDR,CODECS="avc1.4D401F,mp4a.40.2",AUDIO="160000mp4a.40.2",SUBTITLES="100wvtt.vtt"
exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/77335973-prog_index.m3u8?version_hash=569d7c64
#EXT-X-STREAM-INF:BANDWIDTH=64100,AVERAGE-BANDWIDTH=64100,CODECS="mp4a.40.2",AUDIO="64000mp4a.40.2",SUBTITLES="100wvtt.vtt"
exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/1510100771-prog_index.m3u8?version_hash=569d7c64

#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=21278,AVERAGE-BANDWIDTH=21278,RESOLUTION=320x180,VIDEO-RANGE=SDR,CODECS="avc1.42C015",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/693032198-iframe.m3u8?version_hash=569d7c64"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=50026,AVERAGE-BANDWIDTH=50026,RESOLUTION=480x270,VIDEO-RANGE=SDR,CODECS="avc1.42C01E",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/629141097-iframe.m3u8?version_hash=569d7c64"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=99663,AVERAGE-BANDWIDTH=99663,RESOLUTION=640x360,VIDEO-RANGE=SDR,CODECS="avc1.4D401F",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/2095253500-iframe.m3u8?version_hash=569d7c64"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=199041,AVERAGE-BANDWIDTH=199041,RESOLUTION=960x540,VIDEO-RANGE=SDR,CODECS="avc1.4D401F",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/520500154-iframe.m3u8?version_hash=569d7c64"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=274309,AVERAGE-BANDWIDTH=274309,RESOLUTION=1024x576,VIDEO-RANGE=SDR,CODECS="avc1.4D401F",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/77335973-iframe.m3u8?version_hash=569d7c64"

dirkf commented 3 years ago

Does the generic extractor know how to extract subtitles?

The DPlay extractor should be able to handle this site but it doesn't know about the .nl version. When you tell it (add |nl after |no on line 26), it handles the actual show links (eg https://www.discoveryplus.nl/videos/deadliest-catch/season-17-point-of-no-return) to the point of telling you that you need to register and pass your browser cookies from a login session.

Apparently playlist pages aren't yet handled, whether for nl or other countries: they punt to the generic extractor.

The dplus series page has an element like this for each episode:

<script type="application/ld+json">{"@context":["http://schema.org"],"@type":"TVEpisode","@id":"https://www.discoveryplus.nl/videos/deadliest-catch/season-17-point-of-no-return","url":"https://www.discoveryplus.nl/videos/deadliest-catch/season-17-point-of-no-return","name":" Point Of No Return","episodeNumber":3,"partOfSeason":{"@type":"TVSeason","seasonNumber":17},"partOfSeries":{"@type":"TVSeries","@id":"https://www.discoveryplus.nl/programmas/deadliest-catch","name":"Deadliest Catch"},"image":"https://eu2-prod-images.disco-api.com/2021/05/18/324001de-81f1-3bcd-8d91-cde896c0d3e6.png"}</script>

It should be possible to gather the episode details from these, and the _search_json_ld extractor method is meant to do so, but as Discovery has chosen not to send actual web pages but just chimeras that may with luck appear as web pages if their JS runs correctly, this approach won't work.

Instead, we can replicate the calls to the server API made by the JS:

/cms/routes/playlist_id returns a JSON object whose included member is a list in which the playlist is the element with type 'collection' and whose meta.itemsCurrentPage is 1; stash that element's id;
/cms/collections/id returns a JSON object whose included member is a list in which the playlist items are the elements with type 'video'; stash that element's attributes.path as display_id;
the existing _get_disco_api_info method of the DPlayIE extractor can be used to extract data for the info_dict for each item.

At the same time the _VALID_URL can be extended to support .co.uk, and the dplay.xx URL formats can be removed (assuming all are obsolete), and dplay.co.uk can be removed from DiscoveryNetworksDeIE, which could also be moved into the dplay.py source file from discoverynetworks.py.

But Discovery really wants you to have registered and logged in, which I haven't bothered to do.

martijngoorman commented 3 years ago

Euhm I'm sorry but line 26 where? I couldn't extract any subs in any way, so I don't know if generic extractor know how to extract subtitles.

Also, I do have an account on discoveryplus.nl, so that is not the issue (for me) :)

If there is no other option to search for the M3U8 file in the browser, so be it, but subs should be nice!

dirkf commented 3 years ago

As no-one enlightened us regarding the generic extractor, the answer appears to be that the extraction result has to have an automatic captions item, or a subtitles item, for subtitles to be listed with --list-subs. The generic extractor doesn't implement the methods used to extract either item, though it may redirect to other extractors that do when it finds certain embedded media links. You might, though, see the subtitles listed with the -F option for the M3U8 URL (the quoted link doesn't work for me now).

The changes I mentioned would affect the extractor code youtube_dl/extractor/dplay.py, and may not be that easy to apply in a Windows installation, as well as being too extensive. I would be happy to offer a Pull Request if there are enough interested registered users to test the country and language variations.

The attached patch text shows the changes implemented as described earlier.

dplay.py.dif.txt

ytdl-org / youtube-dl