ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.87k stars 10k forks source link

[CNN] Add support for money.cnn.com #13666

Open parmjitv opened 7 years ago

parmjitv commented 7 years ago

Please follow the guide below


Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2017.07.15. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.

Before submitting an issue make sure you have:

What is the purpose of your issue?


$ youtube-dl --verbose 'http://money.cnn.com/mostly-human/silicon-valleys-secret/' [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'-v', u'http://money.cnn.com/mostly-human/silicon-valleys-secret/'] [debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2017.07.15 [debug] Python version 2.7.9 - Linux-3.16.0-4-686-pae-i686-with-debian-8.7 [debug] exe versions: ffmpeg 3.2.4-1, ffprobe 3.2.4-1 [debug] Proxy map: {} [generic] silicon-valleys-secret: Requesting header WARNING: Falling back on generic information extractor. [generic] silicon-valleys-secret: Downloading webpage [generic] silicon-valleys-secret: Extracting information ERROR: Unsupported URL: http://money.cnn.com/mostly-human/silicon-valleys-secret/ Traceback (most recent call last): File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2060, in _real_extract doc = compat_etree_fromstring(webpage.encode('utf-8')) File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2539, in compat_etree_fromstring doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory))) File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2528, in _XML parser.feed(text) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed self._raiseerror(v) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror raise err ParseError: mismatched tag: line 75, column 4 Traceback (most recent call last): File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 776, in extract_info ie_result = ie.extract(url) File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 433, in extract ie_result = self._real_extract(url) File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2915, in _real_extract raise UnsupportedError(url) UnsupportedError: Unsupported URL: http://money.cnn.com/mostly-human/silicon-valleys-secret/



The extractor for the CNN website should be updated to also detect URLs such as:

http://money.cnn.com/mostly-human/silicon-valleys-secret/

Cheers,

Parmjit V.

rpvcg commented 7 years ago

page url: http://money.cnn.com/mostly-human/silicon-valleys-secret/

xml url: http://fave.api.cnn.io/v1/video?id=cnnmoney/2017/03/08/mostly-human-silicon-valleys-secret.cnnmoney&customer=cnn&edition=international&env=prod

best video url: http://ht.cdn.turner.com/cnn/big/cnnmoney/2017/03/08/mostly-human-silicon-valleys-secret.cnnmoney_1283468_ios_5500.mp4

Also, hds and hls as per xml (if someone wants to add these to the extractor).

parmjitv commented 7 years ago

Thanks for the info! I believe the link to the Fave API references a JSON-formatted data stream, rather than XML.

How did you deduce the best video URL? I do not see this link referenced anywhere in the output for this API call.

parmjitv commented 7 years ago

Another URL to consider is:

http://money.cnn.com/2017/08/20/media/trump-carl-bernstein-reliable-sources/index.html

Apparently this URL format does not use the Fave API for the video source.