Open martijngoorman opened 3 years ago
Does the generic extractor know how to extract subtitles?
The DPlay extractor should be able to handle this site but it doesn't know about the .nl
version. When you tell it (add |nl
after |no
on line 26), it handles the actual show links (eg https://www.discoveryplus.nl/videos/deadliest-catch/season-17-point-of-no-return) to the point of telling you that you need to register and pass your browser cookies from a login session.
Apparently playlist pages aren't yet handled, whether for nl or other countries: they punt to the generic extractor.
The dplus series page has an element like this for each episode:
<script type="application/ld+json">{"@context":["http://schema.org"],"@type":"TVEpisode","@id":"https://www.discoveryplus.nl/videos/deadliest-catch/season-17-point-of-no-return","url":"https://www.discoveryplus.nl/videos/deadliest-catch/season-17-point-of-no-return","name":" Point Of No Return","episodeNumber":3,"partOfSeason":{"@type":"TVSeason","seasonNumber":17},"partOfSeries":{"@type":"TVSeries","@id":"https://www.discoveryplus.nl/programmas/deadliest-catch","name":"Deadliest Catch"},"image":"https://eu2-prod-images.disco-api.com/2021/05/18/324001de-81f1-3bcd-8d91-cde896c0d3e6.png"}</script>
It should be possible to gather the episode details from these, and the _search_json_ld
extractor method is meant to do so, but as Discovery has chosen not to send actual web pages but just chimeras that may with luck appear as web pages if their JS runs correctly, this approach won't work.
Instead, we can replicate the calls to the server API made by the JS:
/cms/routes/
playlist_id returns a JSON object whose included
member is a list in which the playlist is the element with type
'collection'
and whose meta.itemsCurrentPage
is 1; stash that element's id
;/cms/collections/
id returns a JSON object whose included
member is a list in which the playlist items are the elements with type
'video'
; stash that element's attributes.path
as display_id
;_get_disco_api_info
method of the DPlayIE
extractor can be used to extract data for the info_dict
for each item.At the same time the _VALID_URL
can be extended to support .co.uk, and the dplay.xx URL formats can be removed (assuming all are obsolete), and dplay.co.uk can be removed from DiscoveryNetworksDeIE
, which could also be moved into the dplay.py
source file from discoverynetworks.py
.
But Discovery really wants you to have registered and logged in, which I haven't bothered to do.
Euhm I'm sorry but line 26 where? I couldn't extract any subs in any way, so I don't know if generic extractor know how to extract subtitles.
Also, I do have an account on discoveryplus.nl, so that is not the issue (for me) :)
If there is no other option to search for the M3U8 file in the browser, so be it, but subs should be nice!
As no-one enlightened us regarding the generic extractor, the answer appears to be that the extraction result has to have an automatic captions
item, or a subtitles
item, for subtitles to be listed with --list-subs
. The generic extractor doesn't implement the methods used to extract either item, though it may redirect to other extractors that do when it finds certain embedded media links. You might, though, see the subtitles listed with the -F
option for the M3U8 URL (the quoted link doesn't work for me now).
The changes I mentioned would affect the extractor code youtube_dl/extractor/dplay.py
, and may not be that easy to apply in a Windows installation, as well as being too extensive. I would be happy to offer a Pull Request if there are enough interested registered users to test the country and language variations.
The attached patch text shows the changes implemented as described earlier.
Checklist
Example URLs
Description
Links directly form discoveryplus.nl doesn't work. When provide the m3u8 url to YTDL, then no subs are show:
But when looking into the M3U8 :