ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.01k stars 10.01k forks source link

Site Support Request: Discoveryplus.nl #29249

Open martijngoorman opened 3 years ago

martijngoorman commented 3 years ago

Checklist

Example URLs

Description

Links directly form discoveryplus.nl doesn't work. When provide the m3u8 url to YTDL, then no subs are show:

PS D:\Downloads> .\youtube-dl.exe -i https://dplaysouth-vod.akamaized.net/dplaydni/215991/0/hls/10180679004/playlist.m3u8?hdnts=st=1623261207~exp=1623347607~acl=/dplaydni/215991/0/hls/10180679004/*~hmac=3b40fe65985b45e9c623ac5802c4f10b5d3caec5fc4a303004a7f71fa7fb7168 --list-subs
[generic] *~hmac=3b40fe65985b45e9c623ac5802c4f10b5d3caec5fc4a303004a7f71fa7fb7168: Requesting header
[generic] *~hmac=3b40fe65985b45e9c623ac5802c4f10b5d3caec5fc4a303004a7f71fa7fb7168: Downloading m3u8 information
*~hmac=3b40fe65985b45e9c623ac5802c4f10b5d3caec5fc4a303004a7f71fa7fb7168 has no subtitles

But when looking into the M3U8 :

#EXTM3U
#EXT-X-VERSION:4
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="160000mp4a.40.2",LANGUAGE="eng",NAME="eng",AUTOSELECT=YES,DEFAULT=NO,CHANNELS="2",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/1477160146-prog_index.m3u8?version_hash=569d7c64"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="64000mp4a.40.2",LANGUAGE="eng",NAME="eng",AUTOSELECT=YES,DEFAULT=NO,CHANNELS="2",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/1510100771-prog_index.m3u8?version_hash=569d7c64"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="100wvtt.vtt",LANGUAGE="nl",NAME="nl",AUTOSELECT=YES,DEFAULT=NO,FORCED=NO,URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/1898817342-prog_index.m3u8?version_hash=569d7c64"
#EXT-X-STREAM-INF:BANDWIDTH=1752428,AVERAGE-BANDWIDTH=1752428,RESOLUTION=960x540,FRAME-RATE=25.000,VIDEO-RANGE=SDR,CODECS="avc1.4D401F,mp4a.40.2",AUDIO="160000mp4a.40.2",SUBTITLES="100wvtt.vtt"
exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/520500154-prog_index.m3u8?version_hash=569d7c64
#EXT-X-STREAM-INF:BANDWIDTH=234324,AVERAGE-BANDWIDTH=234324,RESOLUTION=320x180,FRAME-RATE=25.000,VIDEO-RANGE=SDR,CODECS="avc1.42C015,mp4a.40.2",AUDIO="64000mp4a.40.2",SUBTITLES="100wvtt.vtt"
exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/693032198-prog_index.m3u8?version_hash=569d7c64
#EXT-X-STREAM-INF:BANDWIDTH=464308,AVERAGE-BANDWIDTH=464308,RESOLUTION=480x270,FRAME-RATE=25.000,VIDEO-RANGE=SDR,CODECS="avc1.42C01E,mp4a.40.2",AUDIO="64000mp4a.40.2",SUBTITLES="100wvtt.vtt"
exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/629141097-prog_index.m3u8?version_hash=569d7c64
#EXT-X-STREAM-INF:BANDWIDTH=861404,AVERAGE-BANDWIDTH=861404,RESOLUTION=640x360,FRAME-RATE=25.000,VIDEO-RANGE=SDR,CODECS="avc1.4D401F,mp4a.40.2",AUDIO="64000mp4a.40.2",SUBTITLES="100wvtt.vtt"
exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/2095253500-prog_index.m3u8?version_hash=569d7c64
#EXT-X-STREAM-INF:BANDWIDTH=2354572,AVERAGE-BANDWIDTH=2354572,RESOLUTION=1024x576,FRAME-RATE=25.000,VIDEO-RANGE=SDR,CODECS="avc1.4D401F,mp4a.40.2",AUDIO="160000mp4a.40.2",SUBTITLES="100wvtt.vtt"
exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/77335973-prog_index.m3u8?version_hash=569d7c64
#EXT-X-STREAM-INF:BANDWIDTH=64100,AVERAGE-BANDWIDTH=64100,CODECS="mp4a.40.2",AUDIO="64000mp4a.40.2",SUBTITLES="100wvtt.vtt"
exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/1510100771-prog_index.m3u8?version_hash=569d7c64

#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=21278,AVERAGE-BANDWIDTH=21278,RESOLUTION=320x180,VIDEO-RANGE=SDR,CODECS="avc1.42C015",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/693032198-iframe.m3u8?version_hash=569d7c64"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=50026,AVERAGE-BANDWIDTH=50026,RESOLUTION=480x270,VIDEO-RANGE=SDR,CODECS="avc1.42C01E",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/629141097-iframe.m3u8?version_hash=569d7c64"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=99663,AVERAGE-BANDWIDTH=99663,RESOLUTION=640x360,VIDEO-RANGE=SDR,CODECS="avc1.4D401F",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/2095253500-iframe.m3u8?version_hash=569d7c64"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=199041,AVERAGE-BANDWIDTH=199041,RESOLUTION=960x540,VIDEO-RANGE=SDR,CODECS="avc1.4D401F",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/520500154-iframe.m3u8?version_hash=569d7c64"
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=274309,AVERAGE-BANDWIDTH=274309,RESOLUTION=1024x576,VIDEO-RANGE=SDR,CODECS="avc1.4D401F",URI="exp=1623350715~acl=%2f*~data=hdntl~hmac=0f51804a00427893d2294503c4dfecd8c8b15c2c08a8855dc9b9e1e733bf56f0/77335973-iframe.m3u8?version_hash=569d7c64"
dirkf commented 3 years ago

Does the generic extractor know how to extract subtitles?

The DPlay extractor should be able to handle this site but it doesn't know about the .nl version. When you tell it (add |nl after |no on line 26), it handles the actual show links (eg https://www.discoveryplus.nl/videos/deadliest-catch/season-17-point-of-no-return) to the point of telling you that you need to register and pass your browser cookies from a login session.

Apparently playlist pages aren't yet handled, whether for nl or other countries: they punt to the generic extractor.

The dplus series page has an element like this for each episode:

<script type="application/ld+json">{"@context":["http://schema.org"],"@type":"TVEpisode","@id":"https://www.discoveryplus.nl/videos/deadliest-catch/season-17-point-of-no-return","url":"https://www.discoveryplus.nl/videos/deadliest-catch/season-17-point-of-no-return","name":" Point Of No Return","episodeNumber":3,"partOfSeason":{"@type":"TVSeason","seasonNumber":17},"partOfSeries":{"@type":"TVSeries","@id":"https://www.discoveryplus.nl/programmas/deadliest-catch","name":"Deadliest Catch"},"image":"https://eu2-prod-images.disco-api.com/2021/05/18/324001de-81f1-3bcd-8d91-cde896c0d3e6.png"}</script>

It should be possible to gather the episode details from these, and the _search_json_ld extractor method is meant to do so, but as Discovery has chosen not to send actual web pages but just chimeras that may with luck appear as web pages if their JS runs correctly, this approach won't work.

Instead, we can replicate the calls to the server API made by the JS:

At the same time the _VALID_URL can be extended to support .co.uk, and the dplay.xx URL formats can be removed (assuming all are obsolete), and dplay.co.uk can be removed from DiscoveryNetworksDeIE, which could also be moved into the dplay.py source file from discoverynetworks.py.

But Discovery really wants you to have registered and logged in, which I haven't bothered to do.

martijngoorman commented 3 years ago

Euhm I'm sorry but line 26 where? I couldn't extract any subs in any way, so I don't know if generic extractor know how to extract subtitles.

Also, I do have an account on discoveryplus.nl, so that is not the issue (for me) :)

If there is no other option to search for the M3U8 file in the browser, so be it, but subs should be nice!

dirkf commented 3 years ago

As no-one enlightened us regarding the generic extractor, the answer appears to be that the extraction result has to have an automatic captions item, or a subtitles item, for subtitles to be listed with --list-subs. The generic extractor doesn't implement the methods used to extract either item, though it may redirect to other extractors that do when it finds certain embedded media links. You might, though, see the subtitles listed with the -F option for the M3U8 URL (the quoted link doesn't work for me now).

The changes I mentioned would affect the extractor code youtube_dl/extractor/dplay.py, and may not be that easy to apply in a Windows installation, as well as being too extensive. I would be happy to offer a Pull Request if there are enough interested registered users to test the country and language variations.

The attached patch text shows the changes implemented as described earlier.

dplay.py.dif.txt