ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.21k stars 10.03k forks source link

add support for metrolyrics.com #8251

Open Siddhant opened 8 years ago

Siddhant commented 8 years ago
$ youtube-dl --proxy '' --verbose http://www.metrolyrics.com/news-story-watch-adele-absolutely-crush-her-carpool-karaoke-appearance.html
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--proxy', u'', u'--verbose', u'http://www.metrolyrics.com/news-story-watch-adele-absolutely-crush-her-carpool-karaoke-appearance.html']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.01.15
[debug] Python version 2.7.10 - CYGWIN_NT-6.1-WOW-2.2.1-0.289-5-3-i686-32bit
[debug] exe versions: none
[debug] Proxy map: {}
[generic] news-story-watch-adele-absolutely-crush-her-carpool-karaoke-appearance: Requesting header
WARNING: Falling back on generic information extractor.
[generic] news-story-watch-adele-absolutely-crush-her-carpool-karaoke-appearance: Downloading webpage
[generic] news-story-watch-adele-absolutely-crush-her-carpool-karaoke-appearance: Extracting information
ERROR: Unsupported URL: http://www.metrolyrics.com/news-story-watch-adele-absolutely-crush-her-carpool-karaoke-appearance.html
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/youtube_dl/extractor/generic.py", line 1289, in _real_extract
    doc = compat_etree_fromstring(webpage.encode('utf-8'))
  File "/usr/lib/python2.7/site-packages/youtube_dl/compat.py", line 248, in compat_etree_fromstring
    doc = _XML(text, parser=etree.XMLParser(target=etree.TreeBuilder(element_factory=_element_factory)))
  File "/usr/lib/python2.7/site-packages/youtube_dl/compat.py", line 237, in _XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
ParseError: not well-formed (invalid token): line 21, column 1218
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/youtube_dl/YoutubeDL.py", line 665, in extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python2.7/site-packages/youtube_dl/extractor/common.py", line 312, in extract
    return self._real_extract(url)
  File "/usr/lib/python2.7/site-packages/youtube_dl/extractor/generic.py", line 1908, in _real_extract
    raise UnsupportedError(url)
UnsupportedError: Unsupported URL: http://www.metrolyrics.com/news-story-watch-adele-absolutely-crush-her-carpool-karaoke-appearance.html
davidjameshowell commented 8 years ago

Reference docs:

The page contains player links (SWF) such as for the example above:

<object width="480" height="270"><param name="movie" value="http://canstatic.cbs.com/chrome/canplayer.swf?pid=V3LVbloTnOY_&partner=metrolyrics&gen=1"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><embed width="480" height="270" src="http://canstatic.cbs.com/chrome/canplayer.swf?pid=V3LVbloTnOY_&partner=metrolyrics&gen=1" allowfullscreen="true" allowscriptaccess="always" type="application/x-shockwave-flash"></object>

The PID value is the media item key, in this case, V3LVbloTnOY. When play is invoked, a call is made out to The Platform (their CMS) to get the video src (Akamai HD stream protected by TTL)

Example call to MPX/The Platform:

<smil xmlns="http://www.w3.org/2005/SMIL21/Language">
<head>
    <meta name="refreshToken" content="0152822f62d5674e187ec1d4a37f0e6b5df8eec39ccf00710087fa7e04f17963b5e2bb84668f"/>
    <metadata>
    <seq>
    <ref src="http://ocp.cbs.com/pacific/Request.jsp?/videos.can.com/PARTNER/%2Fent%2Fln%2Flls%2Fcp%3Bsite%3Dentertainment%3Bdpart%3Dlatenight%3Bshow%3Dlate_late_show%3Bfeat%3Dclips;partner=PARTNER;vid=91E34AAD-D71F-46CF-D339-3E33FC10327A;pid=V3LVbloTnOY_;noAd=false;format=video/mp4;length=880000;pos=1;ord=1000" no-skip="true" tags="midroll">
    </ref>
    </seq>
    </metadata>
</head>
<body>
<seq>
    <ref src="http://ocp.cbs.com/pacific/Request.jsp?/videos.can.com/PARTNER/%2Fent%2Fln%2Flls%2Fcp%3Bsite%3Dentertainment%3Bdpart%3Dlatenight%3Bshow%3Dlate_late_show%3Bfeat%3Dclips;partner=PARTNER;vid=91E34AAD-D71F-46CF-D339-3E33FC10327A;pid=V3LVbloTnOY_;noAd=false;format=video/mp4;length=880000;pos=1;ord=1000" no-skip="true" tags="preroll">
    </ref>
    <video src="http://cbsent-vh.akamaihd.net/z/temp_hd_gallery_video/CBS_Production_Outlet_VMS/video_robot/CBS_Production_Entertainment/2016/01/14/601521219643/CORDEN_0132_CLIP1_ADELE_CIAN_723076_,796,1296,496,364,.mp4.csmil/manifest.f4m?hdnea=acl=/z/temp_hd_gallery_video/CBS_Production_Outlet_VMS/video_robot/CBS_Production_Entertainment/2016/01/14/601521219643/CORDEN_0132_CLIP1_ADELE_CIAN_723076_*~exp=1453881889~hmac=c357b57d5b2a4ece08f6b03b175573f8dcc7fd9994602e071b1d22a2fbbd8ba9" title="Adele Carpool Karaoke" abstract="While home in London for the holidays, James Corden picks up his friend Adele for a drive around the city singing some of her classic songs before Adele raps Nicki Minaj&apos;s &quot;Monster.&quot;" copyright="CBS Corp" dur="880000ms" guid="91E34AAD-D71F-46CF-D339-3E33FC10327A" categories="Corden_5min" keywords="&quot;James Corden&quot;, &quot;The Late Late Show&quot;,&quot;Reggie Watts&quot;,&quot;Letterman&quot;, &quot;Colbert&quot;,&quot;late night&quot;,&quot;late night show&quot;,&quot;David Letterman&quot;, &quot;Stephen Colbert&quot; , &quot;The Ellen Show&quot;, &quot;The Tonight Show&quot;,&quot;Comedy&quot;,&quot;monologue&quot;,&quot;sketches&quot;,&quot;comedian&quot;,&quot;celebrity interviews&quot;,&quot;impressions&quot;,&quot;tweet mail&quot;,&quot;celebrities&quot;,&quot;carpool&quot;,&quot;karaoke&quot;,&quot;take a break&quot;,&quot;meming of life&quot;,&quot;emoji news&quot;,&quot;CBS&quot;" provider="CBS Production Entertainment VMS" type="application/f4m+xml" height="360" width="640">
        <param name="Aired" value="false"/>
        <param name="CBSGenre" value="Talk"/>
        <param name="ClosedCaptionURL" value="http://www.cbsstatic.com/closedcaption/Current/LateNight_Clips/DFXP/CBS_CORDEN_0132_CLIP1_caption_DFXP.xml"/>
        <param name="DynamicStreaming" value="true"/>
        <param name="Embeddable" value="true"/>
        <param name="EpisodeFlag" value="false"/>
        <param name="EpisodeNumber" value="132"/>
        <param name="Internal" value="false"/>
        <param name="IsLive" value="false"/>
        <param name="IsLiveCDN" value="akamai"/>
        <param name="NoAd" value="false"/>
        <param name="PrimaryCategory" value="23392979"/>
        <param name="PrimaryCategoryName" value="Late Night/Late Late Show/Clips"/>
        <param name="RepeatFlag" value="false"/>
        <param name="SeasonNumber" value="1"/>
        <param name="SeriesTitle" value="The Late Late Show with James Corden"/>
        <param name="SourcePartner" value=""/>
        <param name="TVRating" value="false"/>
        <param name="VTAG" value="/ent/ln/lls/cp;site=entertainment;dpart=latenight;show=late_late_show;feat=clips"/>
        <param name="WatermarkPlayback" value="false"/>
        <param name="artist" value=""/>
        <param name="dPRAired" value="true"/>
        <param name="sMPTE-TTCCURL" value="http://www.cbsstatic.com/closedcaption/Current/LateNight_Clips/SMPTE/CBS_CORDEN_0132_CLIP1_caption_SMPTE.xml"/>
        <param name="trackingData" value="aid=2198311517|b=796000|bc=CBSI-NEW|ci=1|cid=601561667535|d=1453881769619|l=880000|mediaPid=P5xSTuNQiS1z|pd=1452751500000|pid=V3LVbloTnOY_|prid=10373|pvid=21847452|rid=601561667787"/>
    </video>
    <ref src="http://ocp.cbs.com/pacific/Request.jsp?/videos.can.com/PARTNER/%2Fent%2Fln%2Flls%2Fcp%3Bsite%3Dentertainment%3Bdpart%3Dlatenight%3Bshow%3Dlate_late_show%3Bfeat%3Dclips;partner=PARTNER;vid=91E34AAD-D71F-46CF-D339-3E33FC10327A;pid=V3LVbloTnOY_;noAd=false;format=video/mp4;length=880000;pos=1;ord=1000" no-skip="true" tags="postroll">
    </ref>
</seq>
</body>
</smil>

Tokenized URL:

http://cbsent-vh.akamaihd.net/z/temp_hd_gallery_video/CBS_Production_Outlet_VMS/video_robot/CBS_Production_Entertainment/2016/01/14/601521219643/CORDEN_0132_CLIP1_ADELE_CIAN_723076_,796,1296,496,364,.mp4.csmil/manifest.f4m?hdnea=acl=/z/temp_hd_gallery_video/CBS_Production_Outlet_VMS/video_robot/CBS_Production_Entertainment/2016/01/14/601521219643/CORDEN_0132_CLIP1_ADELE_CIAN_723076_*~exp=1453881889~hmac=c357b57d5b2a4ece08f6b03b175573f8dcc7fd9994602e071b1d22a2fbbd8ba9