ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.25k stars 10.03k forks source link

Add support for nationalgeographic.com #4960

Closed TheGr33k closed 9 years ago

TheGr33k commented 9 years ago

Sample link : http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin?source=featuredvideo

Thanks

rrooij commented 9 years ago

http://feed.theplatform.com/f/ngs/dCCn2isYZ9N9

Found this feed on the page with all the video URLs.

It has a GET parameter called byGuid. If you supply the correct GUID, you will get the download links. For example:

http://feed.theplatform.com/f/ngs/dCCn2isYZ9N9?byGuid=0000014b-70a1-dd8c-af7f-f7b559330001

The GUID is found in:

<section id="player-container" class="" data-permalink="/video/news/150210-news-crab-mating-vin" data-video-guid="0000014b-70a1-dd8c-af7f-f7b559330001"
         data-feed-url="http://feed.theplatform.com/f/ngs/dCCn2isYZ9N9">

Can I pick this one up? :smile:

rrooij commented 9 years ago

I tried this regex for _VALID_URL:

https?://video\.nationalgeographic\.com/video/(?P<category>\w+)/(?P<id>[\w\d-]+)

But it doesn't pick my extractor. Did I make a mistake in my regex? The only thing missing is ignoring the get parameters

phihag commented 9 years ago

@robin007bond Works for me:

>>> re.match(r'https?://video\.nationalgeographic\.com/video/(?P<category>\w+)/(?P<id>[\w\d-]+)', 'http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin?source=featuredvideo')
<_sre.SRE_Match object at 0x7f533515b3e8>
rrooij commented 9 years ago

@phihag

Ok, thanks for trying it out.

rrooij commented 9 years ago

Hmm.. I have a hard time figuring out what to do. They video formats are in a very weird format that mpv doesn't even recognize. Those are SPI files. The XML says that they are regular MP4 files, but they aren't playable with mpv. VLC doesn't play them either.

Someone is free to go further where I left off.

My code so far:

# coding: utf-8
from __future__ import unicode_literals

from .common import InfoExtractor
from ..utils import (
    xpath_text,
    xpath_with_ns
)

class NationalGeographicIE(InfoExtractor):
    _VALID_URL = r'https?://video\.nationalgeographic\.com/video/(?P<category>\w+)/(?P<id>[\w\d-]+)'
    _TEST = {
        'url': 'http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin',
        'md5': 'TODO: md5 sum of the first 10241 bytes of the video file (use --test)',
        'info_dict': {
            'id': '42',
            'ext': 'mp4',
            'title': 'Video title goes here',
            'thumbnail': 're:^https?://.*\.jpg$',
            # TODO more properties, either as:
            # * A value
            # * MD5 checksum; start the string with md5:
            # * A regular expression; start the string with re:
            # * Any Python type (for example int or float)
        }
    }

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(url, video_id)

        # TODO more code goes here, for example ...
        title = self._html_search_regex(r'<h2 class="title">(.*?)</h2>', webpage, 'title')

        video_guid = self._html_search_regex(
                self._html_get_attribute_regex('data-video-guid'),
                webpage, 'guid')

        feed_url =  self._html_search_regex(
                self._html_get_attribute_regex('data-feed-url'),
                webpage, 'feed url')

        feed = self._download_xml(feed_url + '?byGuid=' + video_guid, video_id)

        NS_MAP = {
                'media': 'http://search.yahoo.com/mrss/'
        }

        item = feed.find('./channel/item')

        return {
            'id': video_id,
            'title': self._og_search_title(webpage),
            'description': self._og_search_description(webpage),
            # TODO more properties (see youtube_dl/extractor/common.py)
        }

    @staticmethod
    def _html_get_attribute_regex(attribute):

        return r'{0}\s*=\s*\"([^\"]+)\"'.format(attribute)
phihag commented 9 years ago

Aren't these guys using theplatform? The video doesn't play for me in the browser, but what I see is an output thePlatform PDK in the console. Also, you already did extract some theplatform links. Then why not return a self.url_result(..., 'ThePlatform') ?

rrooij commented 9 years ago

Thanks for the tip, I didn't know that there was already a ThePlatform extractor.

Unfortunately, the extractor doesn't work on the URLs:

 youtube-dl -v --test 'http://link.theplatform.com/s/ngs/uSPxGbU_SRpJ?mbr=true&amp;feed=NG%20Video'                 
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-v', '--test', 'http://link.theplatform.com/s/ngs/uSPxGbU_SRpJ?mbr=true&amp;feed=NG%20Video']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.02.19.1
[debug] Python version 3.4.2 - Linux-3.18.6-1-ARCH-x86_64-with-arch
[debug] exe versions: ffmpeg 2.5.4, ffprobe 2.5.4, rtmpdump 2.4
[debug] Proxy map: {}
[ThePlatform] uSPxGbU_SRpJ: Downloading XML
[ThePlatform] uSPxGbU_SRpJ: Downloading webpage
[debug] Invoking downloader on 'http://ngs-vh.akamaihd.net/z/NG_Video/244/691/150210-news-crab-mating-vin__384166.mp4'
[download] Got server HTTP error. Retrying (attempt 1 of 10)...
[download] Got server HTTP error. Retrying (attempt 2 of 10)...
[download] Got server HTTP error. Retrying (attempt 3 of 10)...
[download] Got server HTTP error. Retrying (attempt 4 of 10)...
[download] Got server HTTP error. Retrying (attempt 5 of 10)...
[download] Got server HTTP error. Retrying (attempt 6 of 10)...
[download] Got server HTTP error. Retrying (attempt 7 of 10)...
[download] Got server HTTP error. Retrying (attempt 8 of 10)...
[download] Got server HTTP error. Retrying (attempt 9 of 10)...
[download] Got server HTTP error. Retrying (attempt 10 of 10)...
ERROR: giving up after 10 retries
  File "/usr/bin/youtube-dl", line 9, in <module>
    load_entry_point('youtube-dl==2015.2.19.1', 'console_scripts', 'youtube-dl')()
  File "/usr/lib/python3.4/site-packages/youtube_dl/__init__.py", line 390, in main
    _real_main(argv)
  File "/usr/lib/python3.4/site-packages/youtube_dl/__init__.py", line 380, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 1376, in download
    res = self.extract_info(url)
  File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 654, in extract_info
    return self.process_ie_result(ie_result, download, extra_info)
  File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 700, in process_ie_result
    return self.process_video_result(ie_result, download=download)
  File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 1134, in process_video_result
    self.process_info(new_info)
  File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 1309, in process_info
    success = dl(filename, info_dict)
  File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 1284, in dl
    return fd.download(name, info)
  File "/usr/lib/python3.4/site-packages/youtube_dl/downloader/common.py", line 339, in download
    return self.real_download(filename, info_dict)
  File "/usr/lib/python3.4/site-packages/youtube_dl/downloader/http.py", line 114, in real_download
    self.report_error('giving up after %s retries' % retries)
  File "/usr/lib/python3.4/site-packages/youtube_dl/downloader/common.py", line 152, in report_error
    self.ydl.report_error(*args, **kargs)
  File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 521, in report_error
    self.trouble(error_message, tb)
  File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 483, in trouble
    tb_data = traceback.format_list(traceback.extract_stack())

When accessing the video URL (the one that yt-dl found) directly from Firefox:

Service Unavailable - DNS failure
The server is temporarily unable to service your request. Please try again later.

Reference #11.5623e17.1424362806.de791727 
jaimeMF commented 9 years ago

Will be supported in the next version, thanks for the report.

I don't know why those links fail, since their webpage uses the f4m manifest I just forced the extractor to use it.

phihag commented 9 years ago

Support for nationalgeographic has been added in youtube-dl 2015.02.19.3. See our FAQ if you need help updating.