ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.3k stars 10.03k forks source link

HiDive Playlist Support (I have it coded) #28394

Open KiyoshiStar opened 3 years ago

KiyoshiStar commented 3 years ago

Checklist

Description

I'm fairly new to python coding and not deeply familiar with youtube-dl's coding conventions, but I had decided to try expanding the current HiDive Support to have playlist support, so that could feed a show page and download in batch if desired.

I didn't really see anything on this (apologies if I missed it) so I had copied the code chunk from Crunchyroll and tried modifying accordingly, but when I tried doing --flat-playlist to test, I just get unsupported URL. Was hoping someone could carry from where I'm stuck and have playlist support implemeneted.

My Attempt/Code Block: (See my more updated progress comment below)

KiyoshiStar commented 3 years ago

Playing around some more on my own. I realized the issue is I'm just pepega and missed a important tidbit from the doc, in regards to adding to the extractors.py

Now that I was able to actually get errors and thus make progress bit by bit. I managed to get working playlist support... almost. It's grabbing the item count/urls properly but when you go to actually download a playlist item, it returns ERROR: no suitable InfoExtractor for URL https://www.hidive.comjavascript:void(0);

I'm lost as to why it's returning javascript:void(0) instead of the /stream/... group that's captured by the regex.

EDIT 3/12: I figured it out. I was just continuing from the previous code block wrong... I thought to add a comma after the } but wasn't supposed to add anything at all it seems. Still fairly new to python as mentioned earlier, so I'm still learning to deal with things slowly~

Final and fully working playlist support code:

class HiDiveShowPlaylistIE(HiDiveIE):
    IE_NAME = 'hidive:playlist'
    _VALID_URL = r'https?://(?:www\.)?hidive\.com(?P<id>/(?:tv|movies).*)'

    _TESTS = [{
        'url': 'https://www.hidive.com/tv/the-comic-artist-and-his-assistants',
        'info_dict': {
            'id': 'the-comic-artist-and-his-assistants',
            'title': 'The Comic Artist and His Assistants'
        },
        'playlist_count': 12,
    }, {
        'url': 'https://www.hidive.com/movies/armored-trooper-votoms-genei-phantom-arc',
        'info_dict': {
            'id': 'armored-trooper-votoms-genei-phantom-arc',
            'title': 'Armored Trooper VOTOMS: Genei ~ Phantom Arc'
        },
        'playlist_count': 6,
        }]

    def _real_extract(self, url):
        title_path = self._match_id(url)

        webpage = self._download_webpage(url, title_path)
        title = self._html_search_regex(r'<h1><a href="[^"]+">([^<]+)</a></h1>', webpage, 'title')

        episode_paths = re.findall(
            r'(?s)id="ekey([^"]+)">.*?"(/stream[^"]+)"',
            webpage)
        entries = [
            self.url_result('https://www.hidive.com' + ep, 'HiDive', ep_id)
            for ep_id, ep in episode_paths
        ]

        return {
            '_type': 'playlist',
            'id': title_path,
            'title': title,
            'entries': entries,
        }
dnbknlol commented 3 years ago

Hope this gets implemented.