ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.45k stars 10.04k forks source link

[ceskatelevize.cz] Getting ERROR: unable to download video data: HTTP Error 403: Forbidden #12119

Closed msourada closed 7 years ago

msourada commented 7 years ago

What is the purpose of your issue?

$ youtube-dl -v http://www.ceskatelevize.cz/ivysilani/1097147804-az-kviz/317291310010031/
[debug] System config: [u'--prefer-free-formats']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'http://www.ceskatelevize.cz/ivysilani/1097147804-az-kviz/317291310010031/']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.02.14
[debug] Python version 2.7.5 - Linux-3.10.0-514.6.1.el7.x86_64-x86_64-with-centos-7.3.1611-Core
[debug] exe versions: ffmpeg 3.2.2, ffprobe 3.2.2, rtmpdump 2.4
[debug] Proxy map: {}
[CeskaTelevize] 317291310010031: Downloading webpage
[CeskaTelevize] 317291310010031: Downloading JSON metadata
[CeskaTelevize] 317291310010031: Downloading JSON metadata
[CeskaTelevize] 317291310010031: Downloading m3u8 information
[download] Downloading playlist: AZ-kvíz
[CeskaTelevize] playlist AZ-kvíz: Collected 1 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
[debug] Invoking downloader on u'http://80.188.78.176:80/atip/ddeedc21f2151e0adc2f68a4d8702bc2/1487012374399/<S t="0" d="180000" r="745"/>'
[hlsnative] Downloading m3u8 manifest
ERROR: unable to download video data: HTTP Error 403: Forbidden
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/youtube_dl/YoutubeDL.py", line 1703, in process_info
    success = dl(filename, info_dict)
  File "/usr/lib/python2.7/site-packages/youtube_dl/YoutubeDL.py", line 1645, in dl
    return fd.download(name, info)
  File "/usr/lib/python2.7/site-packages/youtube_dl/downloader/common.py", line 353, in download
    return self.real_download(filename, info_dict)
  File "/usr/lib/python2.7/site-packages/youtube_dl/downloader/hls.py", line 63, in real_download
    manifest = self.ydl.urlopen(self._prepare_url(info_dict, man_url)).read()
  File "/usr/lib/python2.7/site-packages/youtube_dl/YoutubeDL.py", line 2005, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 437, in open
    response = meth(req, response)
  File "/usr/lib64/python2.7/urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib64/python2.7/urllib2.py", line 475, in error
    return self._call_chain(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
...
<end of log>

Description of your issue, suggested solution and other information

Looks like this is a result of change on ceskatelevize.cz's side. Yesterday it worked (albeit with a different video though, but I've verified the same error happens on that one too), now it does not, and their webplayer looks different.

tatassan commented 7 years ago

Hi,(zdarec) ten samej problém dva dny nazpatek vše OK,nyní Zakázáno,CT měnila přehrávače+ochranu(dá-li se to tak nazvat). yes,ceskatelevize.cz TV-channel change security/player two days ago no problem today same problem: I tested all method for repair since 2014 posts,no not work.Still two day not work. youtube-dl -v http://www.ceskatelevize.cz/porady/10116288585-archiv-ct24/217411058210006/ [debug] System config: [] [debug] User config: [] [debug] Command-line args: [u'-v', u'http://www.ceskatelevize.cz/porady/10116288585-archiv-ct24/217411058210006/'] [debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2016.12.09 [debug] Python version 2.7.6 - Linux-4.4.0-62-generic-x86_64-with-Ubuntu-14.04-trusty [debug] exe versions: avconv 11.3-6, avprobe 11.3-6, rtmpdump 2.4 [debug] Proxy map: {} [CeskaTelevize] 217411058210006: Downloading webpage [CeskaTelevize] 217411058210006: Downloading JSON metadata [CeskaTelevize] 217411058210006: Downloading JSON metadata [CeskaTelevize] 217411058210006: Downloading m3u8 information [download] Downloading playlist: Archiv ČT24: Před sedmdesáti lety – rok 1947 [CeskaTelevize] playlist Archiv ČT24: Před sedmdesáti lety – rok 1947: Collected 1 video ids (downloading 1 of them) [download] Downloading video 1 of 1 [debug] Invoking downloader on u'http://80.188.78.176:80/atip/f451564d1f0b74c20e774e955d4be167/1487059984323/<S t="0" d="180000" r="783"/>' [hlsnative] Downloading m3u8 manifest ERROR: unable to download video data: HTTP Error 403: Forbidden Traceback (most recent call last): File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1699, in process_info success = dl(filename, info_dict) File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1641, in dl return fd.download(name, info) File "/usr/local/bin/youtube-dl/youtube_dl/downloader/common.py", line 353, in download return self.real_download(filename, info_dict) File "/usr/local/bin/youtube-dl/youtube_dl/downloader/hls.py", line 63, in real_download manifest = self.ydl.urlopen(self._prepare_url(info_dict, man_url)).read() File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2001, in urlopen return self._opener.open(req, timeout=self._socket_timeout) File "/usr/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/usr/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python2.7/urllib2.py", line 448, in error return self._call_chain(*args) File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 403: Forbidden

may be someone help us,thanks

oskar456 commented 7 years ago

They switched to MPEG-DASH. Replacing self._extract_m3u8_formats with self._extract_mpd_formats in ceskatelevize.py seems to do the trick, although there may be some more polishing needed.

mkubecek commented 7 years ago

That seems to do the trick. Only with "-f best" I now get error "ExtractorError: requested format not available". Maybe the problem is either separate video only and audio only formats or the fact that iVysilani offers two audio formats with the same reported parameters:

format code     extension  resolution note
dash-1002-1502  m4a        audio only DASH audio  128k , mp4a.40.5 (48000Hz)
dash-1002-1507  m4a        audio only DASH audio  128k , mp4a.40.5 (48000Hz)
dash-1001-1502  mp4        512x288    DASH video  372k , avc1.4d4015, video only
dash-1001-1507  mp4        720x410    DASH video  896k , avc1.4d401e, video only
dash-1001-1503  mp4        720x410    DASH video  904k , avc1.4d401e, video only
dash-1001-1504  mp4        1024x576   DASH video 1920k , avc1.4d401f, video only
dash-1001-1505  mp4        1280x720   DASH video 3456k , avc1.4d401f, video only (best)
oskar456 commented 7 years ago

There is actually a loophole allowing to restore the old HLS-based streaming solution if special User-agent is provided. I've made a quick fix that restores yesterday's functionality, before someone make a proper porting to MPEG-DASH.

oskar456 commented 7 years ago

Thank you @dstftw for update! There is, however, one problem with programmes that have audio description, their stream list look like this:

$ python -m youtube_dl http://www.ceskatelevize.cz/ivysilani/1097181328-udalosti/217411000100214 -F
[CeskaTelevize] 217411000100214: Downloading webpage
[CeskaTelevize] 217411000100214: Downloading JSON metadata
[CeskaTelevize] 217411000100214: Downloading JSON metadata
[CeskaTelevize] 217411000100214: Downloading MPD manifest
[CeskaTelevize] 217411000100214: Downloading MPD manifest
[CeskaTelevize] 217411000100214: Downloading JSON metadata
[CeskaTelevize] 217411000100214: Downloading JSON metadata
[CeskaTelevize] 217411000100214: Downloading m3u8 information
[CeskaTelevize] 217411000100214: Downloading m3u8 information
[download] Downloading playlist: Události
[CeskaTelevize] playlist Události: Collected 1 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
[info] Available formats for 61924494877262517:
format code  extension  resolution note
meta-0       mp4        multiple   Quality selection URL 
meta-1       mp4        multiple   Quality selection URL 
1102-1502    m4a        audio only DASH audio  128k , mp4a.40.5 (48000Hz)
1102-1507    m4a        audio only DASH audio  128k , mp4a.40.5 (48000Hz)
1101-1502    mp4        512x288    DASH video  372k , avc1.42c015, video only
1101-1507    mp4        720x410    DASH video  896k , avc1.4d401e, video only
1101-1503    mp4        720x410    DASH video  904k , avc1.4d401e, video only
1101-1504    mp4        1024x576   DASH video 1920k , avc1.4d401f, video only
1101-1505    mp4        1280x720   DASH video 3456k , avc1.4d401f, video only
500          mp4        unknown     500k 
1024         mp4        unknown    1024k 
1032         mp4        unknown    1032k 
2048         mp4        unknown    2048k 
3584         mp4        unknown    3584k  (best)
[download] Finished downloading playlist: Události

The formats 1102-1507 and 1101-1507 are part of the Audio Description version, which is delivered as a separate DASH stream referenced from JSON playlist like this:

        "streamUrls": {
            "audioDescription": "http://80.188.65.18:80/cdn/uri/get/?token=d8319571eb8b67373d07aeac06f89e1fbea265d6&contentType=vod&expiry=1487112270&id=61924494877253135&playerType=flash&quality=ad&region=1&skipIpAddressCheck=false&userId=53519efc-9381-43c8-8291-a479a7a47a51",
            "main": "http://80.188.65.18:80/cdn/uri/get/?token=db612401772b39d044c93f7a92c32f59c56a58ee&contentType=vod&expiry=1487112270&id=61924494877253135&playerType=flash&quality=web&region=1&skipIpAddressCheck=false&userId=3136b961-a717-4417-b9d9-f05514ea4b59"
        },

Unfortunatelly, format_id is lost when iterating over streams so AD streams get combined with other formats and then even preferred for default download. Could it be somehow possible to de-prefer the formats obtained from audioDescription stream?