ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.2k stars 9.93k forks source link

radio-canada.ca site support request #4020

Open anarcat opened 9 years ago

anarcat commented 9 years ago

hello

it would be great if this would work:

anarcat@angela:Downloads$ python ./youtube-dl --verbose http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7184272
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['--verbose', 'http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7184272']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2014.10.24
[debug] Python version 2.7.3 - Linux-3.2.0-4-amd64-x86_64-with-debian-7.6
[debug] Proxy map: {}
[generic] 7184272: Requesting header
WARNING: Falling back on generic information extractor.
[generic] 7184272: Downloading webpage
[generic] 7184272: Extracting information
ERROR: Unsupported URL: http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7184272; please report this issue on https://yt-dl.org/bug . Be sure to call youtube-dl with the --verbose flag and include its complete output. Make sure you are using the latest version; type  youtube-dl -U  to update.
Traceback (most recent call last):
  File "./youtube-dl/youtube_dl/extractor/generic.py", line 553, in _real_extract
    doc = parse_xml(webpage)
  File "./youtube-dl/youtube_dl/utils.py", line 1550, in parse_xml
    tree = xml.etree.ElementTree.XML(s.encode('utf-8'), **kwargs)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1301, in XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
    raise err
ParseError: syntax error: line 1, column 0
Traceback (most recent call last):
  File "./youtube-dl/youtube_dl/YoutubeDL.py", line 526, in extract_info
    ie_result = ie.extract(url)
  File "./youtube-dl/youtube_dl/extractor/common.py", line 193, in extract
    return self._real_extract(url)
  File "./youtube-dl/youtube_dl/extractor/generic.py", line 933, in _real_extract
    raise ExtractorError('Unsupported URL: %s' % url)
ExtractorError: Unsupported URL: http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7184272; please report this issue on https://yt-dl.org/bug . Be sure to call youtube-dl with the --verbose flag and include its complete output. Make sure you are using the latest version; type  youtube-dl -U  to update.

this is with the latest version.

anarcat commented 9 years ago

another example:

anarcat@marcos:youtube-dl-2014.10.30$ ./youtube-dl --verbose https://ici.radio-canada.ca/nouvelles/societe/2014/12/05/001-entrevue-mere-monique-marc-lepine-tueur-polytechnique-25-ans.shtml?isAutoPlay=1
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['--verbose', 'https://ici.radio-canada.ca/nouvelles/societe/2014/12/05/001-entrevue-mere-monique-marc-lepine-tueur-polytechnique-25-ans.shtml?isAutoPlay=1']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2014.10.30
[debug] Python version 2.7.8 - Linux-3.16.0-4-amd64-x86_64-with-debian-jessie-sid
[debug] exe versions: avconv 11-6, avprobe 11-6
[debug] Proxy map: {}
[generic] 001-entrevue-mere-monique-marc-lepine-tueur-polytechnique-25-ans: Requesting header
WARNING: Falling back on generic information extractor.
[generic] 001-entrevue-mere-monique-marc-lepine-tueur-polytechnique-25-ans: Downloading webpage
[generic] 001-entrevue-mere-monique-marc-lepine-tueur-polytechnique-25-ans: Extracting information
ERROR: Unsupported URL: https://ici.radio-canada.ca/nouvelles/societe/2014/12/05/001-entrevue-mere-monique-marc-lepine-tueur-polytechnique-25-ans.shtml?isAutoPlay=1; please report this issue on https://yt-dl.org/bug . Be sure to call youtube-dl with the --verbose flag and include its complete output. Make sure you are using the latest version; type  youtube-dl -U  to update.
Traceback (most recent call last):
  File "./youtube-dl/youtube_dl/extractor/generic.py", line 581, in _real_extract
    doc = parse_xml(webpage)
  File "./youtube-dl/youtube_dl/utils.py", line 1629, in parse_xml
    tree = xml.etree.ElementTree.XML(s.encode('utf-8'), **kwargs)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
ParseError: syntax error: line 2, column 0
Traceback (most recent call last):
  File "./youtube-dl/youtube_dl/YoutubeDL.py", line 533, in extract_info
    ie_result = ie.extract(url)
  File "./youtube-dl/youtube_dl/extractor/common.py", line 198, in extract
    return self._real_extract(url)
  File "./youtube-dl/youtube_dl/extractor/generic.py", line 962, in _real_extract
    raise ExtractorError('Unsupported URL: %s' % url)
ExtractorError: Unsupported URL: https://ici.radio-canada.ca/nouvelles/societe/2014/12/05/001-entrevue-mere-monique-marc-lepine-tueur-polytechnique-25-ans.shtml?isAutoPlay=1; please report this issue on https://yt-dl.org/bug . Be sure to call youtube-dl with the --verbose flag and include its complete output. Make sure you are using the latest version; type  youtube-dl -U  to update.
baldurmen commented 8 years ago

@remitamine even with the latest git clone, this still fails.

Commit is indeed there:

$ git log 444417edb55a5bf471697a3b2353fdbfb6f7e26d
commit 444417edb55a5bf471697a3b2353fdbfb6f7e26d
Author: remitamine <remitamine@gmail.com>
Date:   Tue May 24 15:58:27 2016 +0100

    [radiocanada] Add new extractor(#4020)

I'm not famliar with youtube-dl development. Do you need to add something else for this to work? From the trace, it does not seem to use your extractor...

$ youtube-dl --verbose http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7554948
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--verbose', u'http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7554948']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.02.22
[debug] Python version 2.7.11+ - Linux-4.4.0-1-amd64-x86_64-with-debian-stretch-sid
[debug] exe versions: avconv 2.8.6-1, avprobe 2.8.6-1, ffmpeg 2.8.6-1, ffprobe 2.8.6-1, rtmpdump 2.4
[debug] Proxy map: {}
[generic] 7554948: Requesting header
WARNING: Falling back on generic information extractor.
[generic] 7554948: Downloading webpage
[generic] 7554948: Extracting information
ERROR: Unsupported URL: http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7554948
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/generic.py", line 1308, in _real_extract
    doc = compat_etree_fromstring(webpage.encode('utf-8'))
  File "/usr/lib/python2.7/dist-packages/youtube_dl/compat.py", line 248, in compat_etree_fromstring
    doc = _XML(text, parser=etree.XMLParser(target=etree.TreeBuilder(element_factory=_element_factory)))
  File "/usr/lib/python2.7/dist-packages/youtube_dl/compat.py", line 237, in _XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1653, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1517, in _raiseerror
    raise err
ParseError: syntax error: line 1, column 0
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 666, in extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/common.py", line 316, in extract
    return self._real_extract(url)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/generic.py", line 1950, in _real_extract
    raise UnsupportedError(url)
UnsupportedError: Unsupported URL: http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7554948
yan12125 commented 8 years ago

[debug] youtube-dl version 2016.02.22

Seems you have multiple youtube-dl versions installed. However, the latest version is also broken:

$ youtube-dl --verbose http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7554948 
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['--verbose', 'http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7554948']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.06.19.1
[debug] Git HEAD: 8197079
[debug] Python version 3.5.1 - Linux-4.6.2-1-ARCH-x86_64-with-arch
[debug] exe versions: avconv v12_dev0-2785-g1e9c5bf, avprobe v12_dev0-2785-g1e9c5bf, ffmpeg 3.0.2, ffprobe 3.0.2, rtmpdump 2.4
[debug] Proxy map: {}
[radiocanada] 7554948: Downloading flash XML
[radiocanada] 7554948: Downloading metadata XML
Traceback (most recent call last):
  File "<string>", line 23, in <module>
  File "/home/yen/Projects/youtube-dl/youtube_dl/__init__.py", line 420, in main
    _real_main(argv)
  File "/home/yen/Projects/youtube-dl/youtube_dl/__init__.py", line 410, in _real_main
    retcode = ydl.download(all_urls)
  File "/home/yen/Projects/youtube-dl/youtube_dl/YoutubeDL.py", line 1740, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/home/yen/Projects/youtube-dl/youtube_dl/YoutubeDL.py", line 687, in extract_info
    return self.process_ie_result(ie_result, download, extra_info)
  File "/home/yen/Projects/youtube-dl/youtube_dl/YoutubeDL.py", line 733, in process_ie_result
    return self.process_video_result(ie_result, download=download)
  File "/home/yen/Projects/youtube-dl/youtube_dl/YoutubeDL.py", line 1386, in process_video_result
    self.process_info(new_info)
  File "/home/yen/Projects/youtube-dl/youtube_dl/YoutubeDL.py", line 1451, in process_info
    if len(info_dict['title']) > 200:
TypeError: object of type 'NoneType' has no len()
baldurmen commented 8 years ago

@yan12125 oh damn my bad, I wasn't using the write exec path -_-'

But your trace is true enough. i get the same thing. It does work for other urls though (http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7184272) fine... Maybe it's an issue on the website side.

remitamine commented 8 years ago

the extractor works for most of the content from the supported urls but it didn't work for this url because the data returned from the api doesn't contain a title:

<Meta name="AV-nomEmission">lasoireeestencorejeune</Meta>
<Meta name="Title" />
<Meta name="TitleID" />
<Meta name="Author" />
yan12125 commented 8 years ago

I guess the title can be extracted from the webpage in such cases.

By the way, seems this issue can be closed after all titles are correctly extracted?

remitamine commented 8 years ago

By the way, seems this issue can be closed after all titles are correctly extracted?

i added support for two types of urls, i didn't check the articles that contain videos(like the second url in the issue)

nlevitt commented 7 years ago
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'http://ici.radio-canada.ca/nouvelle/780619/pensionnats-autochtones-ottawa-acadie-terre-neuve-et-labrador']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.06.23
[debug] Git HEAD: e6b5770f6c
[debug] Python version 3.5.2 - Darwin-15.6.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 3.2.2, ffprobe 3.2.2
[debug] Proxy map: {}
[generic] pensionnats-autochtones-ottawa-acadie-terre-neuve-et-labrador: Requesting header
WARNING: Falling back on generic information extractor.
[generic] pensionnats-autochtones-ottawa-acadie-terre-neuve-et-labrador: Downloading webpage
[generic] pensionnats-autochtones-ottawa-acadie-terre-neuve-et-labrador: Extracting information
ERROR: Unsupported URL: http://ici.radio-canada.ca/nouvelle/780619/pensionnats-autochtones-ottawa-acadie-terre-neuve-et-labrador
Traceback (most recent call last):
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 762, in extract_info
    ie_result = ie.extract(url)
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 433, in extract
    ie_result = self._real_extract(url)
  File "/Users/nlevitt/workspace/brozzler/brozzler-ve35/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 2796, in _real_extract
    raise UnsupportedError(url)
youtube_dl.utils.UnsupportedError: Unsupported URL: http://ici.radio-canada.ca/nouvelle/780619/pensionnats-autochtones-ottawa-acadie-terre-neuve-et-labrador
jnbdz commented 7 years ago

Here is another example. But this one has two videos:

youtube-dl -v http://ici.radio-canada.ca/nouvelle/1044381/anne-marie-dussault-entrevue-omar-khadr-canada-prison?fromBeta=true
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'http://ici.radio-canada.ca/nouvelle/1044381/anne-marie-dussault-entrevue-omar-khadr-canada-prison?fromBeta=true']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.07.09
[debug] Python version 2.7.12 - Linux-4.4.0-72-generic-x86_64-with-Ubuntu-16.04-xenial
[debug] exe versions: ffmpeg 2.8.11-0ubuntu0.16.04.1, ffprobe 2.8.11-0ubuntu0.16.04.1
[debug] Proxy map: {}
[generic] anne-marie-dussault-entrevue-omar-khadr-canada-prison?fromBeta=true: Requesting header
WARNING: Falling back on generic information extractor.
[generic] anne-marie-dussault-entrevue-omar-khadr-canada-prison?fromBeta=true: Downloading webpage
[generic] anne-marie-dussault-entrevue-omar-khadr-canada-prison?fromBeta=true: Extracting information
ERROR: Unsupported URL: http://ici.radio-canada.ca/nouvelle/1044381/anne-marie-dussault-entrevue-omar-khadr-canada-prison?fromBeta=true
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2043, in _real_extract
    doc = compat_etree_fromstring(webpage.encode('utf-8'))
  File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2539, in compat_etree_fromstring
    doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory)))
  File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2528, in _XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1653, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1517, in _raiseerror
    raise err
ParseError: syntax error: line 66, column 0
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 762, in extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 433, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2893, in _real_extract
    raise UnsupportedError(url)
UnsupportedError: Unsupported URL: http://ici.radio-canada.ca/nouvelle/1044381/anne-marie-dussault-entrevue-omar-khadr-canada-prison?fromBeta=true