ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.46k stars 10.04k forks source link

This is [ceskatelevize] ... why youtube-dl doesn't recognize it? #5482

Closed mcepl closed 8 years ago

mcepl commented 9 years ago

On other URLs of the Czech TV (e.g., http://www.ceskatelevize.cz/ivysilani/1096911352-objektiv/210411030401003/obsah/127997-ukrajina-lvov/ ) youtube-dl properly recognizes that it is a Czech TV and uses appropriate settings ([CeskaTelevize]), but not here:

matej@mitmanek: ~$ youtube-dl -v 'http://www.ceskatelevize.cz/ct24/regiony/306179-prahou-jezdi-chytre-tramvaje-samy-brzdi-do-zatacek-a-ukazuji-polohu/'
[debug] System config: ['--prefer-free-formats']
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.ceskatelevize.cz/ct24/regiony/306179-prahou-jezdi-chytre-tramvaje-samy-brzdi-do-zatacek-a-ukazuji-polohu/']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.04.03
[debug] Python version 2.7.5 - Linux-3.10.0-229.1.2.el7.x86_64-x86_64-with-redhat-7.1-Maipo
[debug] exe versions: ffmpeg 2.3.4, ffprobe 2.3.4, rtmpdump 2.4
[debug] Proxy map: {}
[generic] 306179-prahou-jezdi-chytre-tramvaje-samy-brzdi-do-zatacek-a-ukazuji-polohu: Requesting header
WARNING: Falling back on generic information extractor.
[generic] 306179-prahou-jezdi-chytre-tramvaje-samy-brzdi-do-zatacek-a-ukazuji-polohu: Downloading webpage
[generic] 306179-prahou-jezdi-chytre-tramvaje-samy-brzdi-do-zatacek-a-ukazuji-polohu: Extracting information
ERROR: Unsupported URL: http://www.ceskatelevize.cz/ct24/regiony/306179-prahou-jezdi-chytre-tramvaje-samy-brzdi-do-zatacek-a-ukazuji-polohu/
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/youtube_dl/extractor/generic.py", line 846, in _real_extract
    doc = parse_xml(webpage)
  File "/usr/lib/python2.7/site-packages/youtube_dl/utils.py", line 1528, in parse_xml
    tree = xml.etree.ElementTree.XML(s.encode('utf-8'), **kwargs)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1300, in XML
    parser.feed(text)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
ParseError: undefined entity —: line 21, column 79
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/youtube_dl/YoutubeDL.py", line 651, in extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python2.7/site-packages/youtube_dl/extractor/common.py", line 275, in extract
    return self._real_extract(url)
  File "/usr/lib/python2.7/site-packages/youtube_dl/extractor/generic.py", line 1345, in _real_extract
    raise UnsupportedError(url)
UnsupportedError: Unsupported URL: http://www.ceskatelevize.cz/ct24/regiony/306179-prahou-jezdi-chytre-tramvaje-samy-brzdi-do-zatacek-a-ukazuji-polohu/

matej@mitmanek: ~$ 
fstirlitz commented 9 years ago

You can look into youtube_dl/extractor/ceskatelevize.py and see for yourself: this URL format is simply not recognised.

Adapting the extractor to handle this case is easy, but there is a problem: on /ct24/ pages the web player is configured to play back a fixed time slice of a larger video. I'm not sure what is the best way to handle this in youtube-dl. My WIP code emits a warning and downloads the whole video. One could pass -ss and -t to ffmpeg instead.

dstftw commented 8 years ago

Works fine with latest version (2016.05.16).