[CNN] Add support for money.cnn.com

parmjitv commented 7 years ago

Please follow the guide below

You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
Put an x into all the boxes [ ] relevant to your issue (like that [x])
Use Preview tab to see how your issue will actually look like

Make sure you are using the latest version: run `youtube-dl --version` and ensure your version is 2017.07.15. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.

[ x] I've verified and I assure that I'm running youtube-dl 2017.07.15

Before submitting an issue make sure you have:

[ x] At least skimmed through README and most notably FAQ and BUGS sections
[ x] Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

[ x] Bug report (encountered problems with youtube-dl)
[ ] Site support request (request for adding support for a new site)
[ ] Feature request (request for a new functionality)
[ ] Question
[ ] Other

$ youtube-dl --verbose 'http://money.cnn.com/mostly-human/silicon-valleys-secret/' [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'-v', u'http://money.cnn.com/mostly-human/silicon-valleys-secret/'] [debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2017.07.15 [debug] Python version 2.7.9 - Linux-3.16.0-4-686-pae-i686-with-debian-8.7 [debug] exe versions: ffmpeg 3.2.4-1, ffprobe 3.2.4-1 [debug] Proxy map: {} [generic] silicon-valleys-secret: Requesting header WARNING: Falling back on generic information extractor. [generic] silicon-valleys-secret: Downloading webpage [generic] silicon-valleys-secret: Extracting information ERROR: Unsupported URL: http://money.cnn.com/mostly-human/silicon-valleys-secret/ Traceback (most recent call last): File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2060, in _real_extract doc = compat_etree_fromstring(webpage.encode('utf-8')) File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2539, in compat_etree_fromstring doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory))) File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2528, in _XML parser.feed(text) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed self._raiseerror(v) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror raise err ParseError: mismatched tag: line 75, column 4 Traceback (most recent call last): File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 776, in extract_info ie_result = ie.extract(url) File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 433, in extract ie_result = self._real_extract(url) File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2915, in _real_extract raise UnsupportedError(url) UnsupportedError: Unsupported URL: http://money.cnn.com/mostly-human/silicon-valleys-secret/

Single video: http://money.cnn.com/mostly-human/silicon-valleys-secret/

The extractor for the CNN website should be updated to also detect URLs such as:

http://money.cnn.com/mostly-human/silicon-valleys-secret/

Cheers,

Parmjit V.

rpvcg commented 7 years ago

page url: http://money.cnn.com/mostly-human/silicon-valleys-secret/

xml url: http://fave.api.cnn.io/v1/video?id=cnnmoney/2017/03/08/mostly-human-silicon-valleys-secret.cnnmoney&customer=cnn&edition=international&env=prod

best video url: http://ht.cdn.turner.com/cnn/big/cnnmoney/2017/03/08/mostly-human-silicon-valleys-secret.cnnmoney_1283468_ios_5500.mp4

Also, hds and hls as per xml (if someone wants to add these to the extractor).

parmjitv commented 7 years ago

Thanks for the info! I believe the link to the Fave API references a JSON-formatted data stream, rather than XML.

How did you deduce the best video URL? I do not see this link referenced anywhere in the output for this API call.

parmjitv commented 7 years ago

Another URL to consider is:

http://money.cnn.com/2017/08/20/media/trump-carl-bernstein-reliable-sources/index.html

Apparently this URL format does not use the Fave API for the video source.

ytdl-org / youtube-dl