ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.87k stars 10k forks source link

[Generic] Add support for playlists if more than one video is found #5587

Open snipem opened 9 years ago

snipem commented 9 years ago

Treat a url as a playlist if more than one video url is found. This should be a thing for every url that is handled with the generic video extractor.

jaimeMF commented 9 years ago

Post an example url.

yan12125 commented 9 years ago

Here is one: http://www.mmafighting.com/2014/2/2/5370376/ufc-169-post-fight-show

This page contains both youtube and ooyala videos, while youtube-dl detects the youtube video first, so the ooyala video is not downloaded at all.

jnbdz commented 7 years ago

In the file: youtube-dl/youtube_dl/extractor/generic.py I removed some of the return in the method: _real_extract. It was then able to extract more videos from different video services. But then I ran into this error:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://tifrib.com/said-rageah/']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.04.26
[debug] Git HEAD: e8bfe2a
[debug] Python version 2.7.12 - Linux-4.4.0-72-generic-x86_64-with-Ubuntu-16.04-xenial
[debug] exe versions: ffmpeg 2.8.11-0ubuntu0.16.04.1, ffprobe 2.8.11-0ubuntu0.16.04.1
[debug] Proxy map: {}
 --- self._real_extract
 --- Called _real_extract for embeded URLs
 --- https://tifrib.com/said-rageah/
[generic] said-rageah: Requesting header
WARNING: Falling back on generic information extractor.
[generic] said-rageah: Downloading webpage
[generic] said-rageah: Extracting information
 --- Look for embedded YouTube player
 --- Found embedded Youtube video
[u'https://videopress.com/embed/4BajuZCH', u'https://videopress.com/embed/X1is4uyi', u'https://videopress.com/embed/aJlE15aE', u'https://videopress.com/embed/SV3AWSeV']
ERROR: Unsupported URL: https://tifrib.com/said-rageah/
Traceback (most recent call last):
  File "youtube_dl/extractor/generic.py", line 1916, in _real_extract
    doc = compat_etree_fromstring(webpage.encode('utf-8'))
  File "youtube_dl/compat.py", line 2526, in compat_etree_fromstring
    doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory)))
  File "youtube_dl/compat.py", line 2515, in _XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1653, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1517, in _raiseerror
    raise err
ParseError: not well-formed (invalid token): line 42, column 344
Traceback (most recent call last):
  File "youtube_dl/YoutubeDL.py", line 760, in extract_info
    ie_result = ie.extract(url)
  File "youtube_dl/extractor/common.py", line 430, in extract
    ie_result = self._real_extract(url)
  File "youtube_dl/extractor/generic.py", line 2786, in _real_extract
    raise UnsupportedError(url)

I think it's because I removed too many return and Youtube-dl default to an extractor and that one did not recognize anything... So I don't think it will be hard for me to find a solution to this.

I am posting this here because I would like your feedbacks on the strategy I have chosen to resolve this issue.

jnbdz commented 7 years ago

@yan12125 I tried your URL (http://www.mmafighting.com/2014/2/2/5370376/ufc-169-post-fight-show). I was only able to download one of the videos (the one from Ooyala). I am not sure why yet.

jnbdz commented 7 years ago

@yan12125 I just looked at the log on my terminal... It seems it found the Youtube video but it's not downloading it for some reason.

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'http://www.mmafighting.com/2014/2/2/5370376/ufc-169-post-fight-show']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.04.26
[debug] Git HEAD: e8bfe2a
[debug] Python version 2.7.12 - Linux-4.4.0-72-generic-x86_64-with-Ubuntu-16.04-xenial
[debug] exe versions: ffmpeg 2.8.11-0ubuntu0.16.04.1, ffprobe 2.8.11-0ubuntu0.16.04.1
[debug] Proxy map: {}
 --- self._real_extract
 --- Called _real_extract for embeded URLs
 --- http://www.mmafighting.com/2014/2/2/5370376/ufc-169-post-fight-show
[generic] ufc-169-post-fight-show: Requesting header
WARNING: Falling back on generic information extractor.
[generic] ufc-169-post-fight-show: Downloading webpage
[generic] ufc-169-post-fight-show: Extracting information
 --- Look for embedded YouTube player
 --- Found embedded Youtube video
[]
 --- self._real_extract
[Ooyala] 5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j: Downloading JSON metadata
[Ooyala] 5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j: Downloading JSON metadata
[Ooyala] 5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j: Downloading m3u8 information
[debug] Invoking downloader on u'http://player.ooyala.com/player/all/5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j_4000.m3u8'
[download] UFC 169 post-fight show-5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j.mp4 has already been downloaded
[download] 100% of 386.05MiB
[debug] ffmpeg command line: ffprobe -show_streams 'file:UFC 169 post-fight show-5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j.mp4'
[ffmpeg] Fixing malformated aac bitstream in "UFC 169 post-fight show-5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j.mp4"
[debug] ffmpeg command line: ffmpeg -y -i 'file:UFC 169 post-fight show-5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j.mp4' -c copy -f mp4 -bsf:a aac_adtstoasc 'file:UFC 169 post-fight show-5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j.temp.mp4'
yan12125 commented 7 years ago

Removing returns is not enough. Need a generic approach to combine different URLs from different extractors in generic.py

jnbdz commented 7 years ago

"combine different URLs from different extractors in generic.py" - How? I am willing to do it but I am unsure of what you mean.

yan12125 commented 7 years ago

For example, pages Brightcove videos yield an playlist:

            return {
                '_type': 'playlist',
                'title': video_title,
                'id': video_id,
                'entries': entries,
            }

And Wistia videos give a transparent URL:

            return {
                '_type': 'url_transparent',
                'url': embed_url,
                'ie_key': 'Wistia',
                'uploader': video_uploader,
            }

The overall result can be a playlist of them: (I'm not sure whether this approach can handle all possible cases or not)


        return {
            '_type': 'playlist',
            'entries': [{
                '_type': 'playlist',
                'title': video_title,
                'id': video_id,
                'entries': entries,
            }, {
                '_type': 'url_transparent',
                'url': embed_url,
                'ie_key': 'Wistia',
                'uploader': video_uploader,
            }]
        }
jnbdz commented 7 years ago

Let me try it out. It might not be perfect but over time we can correct the code.