Closed TheGr33k closed 9 years ago
http://feed.theplatform.com/f/ngs/dCCn2isYZ9N9
Found this feed on the page with all the video URLs.
It has a GET parameter called byGuid. If you supply the correct GUID, you will get the download links. For example:
http://feed.theplatform.com/f/ngs/dCCn2isYZ9N9?byGuid=0000014b-70a1-dd8c-af7f-f7b559330001
The GUID is found in:
<section id="player-container" class="" data-permalink="/video/news/150210-news-crab-mating-vin" data-video-guid="0000014b-70a1-dd8c-af7f-f7b559330001"
data-feed-url="http://feed.theplatform.com/f/ngs/dCCn2isYZ9N9">
Can I pick this one up? :smile:
I tried this regex for _VALID_URL:
https?://video\.nationalgeographic\.com/video/(?P<category>\w+)/(?P<id>[\w\d-]+)
But it doesn't pick my extractor. Did I make a mistake in my regex? The only thing missing is ignoring the get parameters
@robin007bond Works for me:
>>> re.match(r'https?://video\.nationalgeographic\.com/video/(?P<category>\w+)/(?P<id>[\w\d-]+)', 'http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin?source=featuredvideo')
<_sre.SRE_Match object at 0x7f533515b3e8>
@phihag
Ok, thanks for trying it out.
Hmm.. I have a hard time figuring out what to do. They video formats are in a very weird format that mpv doesn't even recognize. Those are SPI files. The XML says that they are regular MP4 files, but they aren't playable with mpv. VLC doesn't play them either.
Someone is free to go further where I left off.
My code so far:
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
xpath_text,
xpath_with_ns
)
class NationalGeographicIE(InfoExtractor):
_VALID_URL = r'https?://video\.nationalgeographic\.com/video/(?P<category>\w+)/(?P<id>[\w\d-]+)'
_TEST = {
'url': 'http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin',
'md5': 'TODO: md5 sum of the first 10241 bytes of the video file (use --test)',
'info_dict': {
'id': '42',
'ext': 'mp4',
'title': 'Video title goes here',
'thumbnail': 're:^https?://.*\.jpg$',
# TODO more properties, either as:
# * A value
# * MD5 checksum; start the string with md5:
# * A regular expression; start the string with re:
# * Any Python type (for example int or float)
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
# TODO more code goes here, for example ...
title = self._html_search_regex(r'<h2 class="title">(.*?)</h2>', webpage, 'title')
video_guid = self._html_search_regex(
self._html_get_attribute_regex('data-video-guid'),
webpage, 'guid')
feed_url = self._html_search_regex(
self._html_get_attribute_regex('data-feed-url'),
webpage, 'feed url')
feed = self._download_xml(feed_url + '?byGuid=' + video_guid, video_id)
NS_MAP = {
'media': 'http://search.yahoo.com/mrss/'
}
item = feed.find('./channel/item')
return {
'id': video_id,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
# TODO more properties (see youtube_dl/extractor/common.py)
}
@staticmethod
def _html_get_attribute_regex(attribute):
return r'{0}\s*=\s*\"([^\"]+)\"'.format(attribute)
Aren't these guys using theplatform? The video doesn't play for me in the browser, but what I see is an output thePlatform PDK
in the console. Also, you already did extract some theplatform links. Then why not return a self.url_result(..., 'ThePlatform')
?
Thanks for the tip, I didn't know that there was already a ThePlatform extractor.
Unfortunately, the extractor doesn't work on the URLs:
youtube-dl -v --test 'http://link.theplatform.com/s/ngs/uSPxGbU_SRpJ?mbr=true&feed=NG%20Video'
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-v', '--test', 'http://link.theplatform.com/s/ngs/uSPxGbU_SRpJ?mbr=true&feed=NG%20Video']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.02.19.1
[debug] Python version 3.4.2 - Linux-3.18.6-1-ARCH-x86_64-with-arch
[debug] exe versions: ffmpeg 2.5.4, ffprobe 2.5.4, rtmpdump 2.4
[debug] Proxy map: {}
[ThePlatform] uSPxGbU_SRpJ: Downloading XML
[ThePlatform] uSPxGbU_SRpJ: Downloading webpage
[debug] Invoking downloader on 'http://ngs-vh.akamaihd.net/z/NG_Video/244/691/150210-news-crab-mating-vin__384166.mp4'
[download] Got server HTTP error. Retrying (attempt 1 of 10)...
[download] Got server HTTP error. Retrying (attempt 2 of 10)...
[download] Got server HTTP error. Retrying (attempt 3 of 10)...
[download] Got server HTTP error. Retrying (attempt 4 of 10)...
[download] Got server HTTP error. Retrying (attempt 5 of 10)...
[download] Got server HTTP error. Retrying (attempt 6 of 10)...
[download] Got server HTTP error. Retrying (attempt 7 of 10)...
[download] Got server HTTP error. Retrying (attempt 8 of 10)...
[download] Got server HTTP error. Retrying (attempt 9 of 10)...
[download] Got server HTTP error. Retrying (attempt 10 of 10)...
ERROR: giving up after 10 retries
File "/usr/bin/youtube-dl", line 9, in <module>
load_entry_point('youtube-dl==2015.2.19.1', 'console_scripts', 'youtube-dl')()
File "/usr/lib/python3.4/site-packages/youtube_dl/__init__.py", line 390, in main
_real_main(argv)
File "/usr/lib/python3.4/site-packages/youtube_dl/__init__.py", line 380, in _real_main
retcode = ydl.download(all_urls)
File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 1376, in download
res = self.extract_info(url)
File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 654, in extract_info
return self.process_ie_result(ie_result, download, extra_info)
File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 700, in process_ie_result
return self.process_video_result(ie_result, download=download)
File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 1134, in process_video_result
self.process_info(new_info)
File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 1309, in process_info
success = dl(filename, info_dict)
File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 1284, in dl
return fd.download(name, info)
File "/usr/lib/python3.4/site-packages/youtube_dl/downloader/common.py", line 339, in download
return self.real_download(filename, info_dict)
File "/usr/lib/python3.4/site-packages/youtube_dl/downloader/http.py", line 114, in real_download
self.report_error('giving up after %s retries' % retries)
File "/usr/lib/python3.4/site-packages/youtube_dl/downloader/common.py", line 152, in report_error
self.ydl.report_error(*args, **kargs)
File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 521, in report_error
self.trouble(error_message, tb)
File "/usr/lib/python3.4/site-packages/youtube_dl/YoutubeDL.py", line 483, in trouble
tb_data = traceback.format_list(traceback.extract_stack())
When accessing the video URL (the one that yt-dl found) directly from Firefox:
Service Unavailable - DNS failure
The server is temporarily unable to service your request. Please try again later.
Reference #11.5623e17.1424362806.de791727
Will be supported in the next version, thanks for the report.
I don't know why those links fail, since their webpage uses the f4m manifest I just forced the extractor to use it.
Sample link : http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin?source=featuredvideo
Thanks