ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.97k stars 10.01k forks source link

Extractor error caused by StopIteration() at epv.elpais.com #12139

Closed dmelladom closed 7 years ago

dmelladom commented 7 years ago

Please follow the guide below


Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2017.02.14. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.

Before submitting an issue make sure you have:

What is the purpose of your issue?


The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue


If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add -v flag to your command line you run youtube-dl with, copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

youtube-dl -v http://epv.elpais.com/epv/2017/02/14/programa_la_voz_de_inaki/1487062137_075943.html

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'http://epv.elpais.com/epv/2017/02/14/programa_la_voz_de_inaki/1487062137_075943.html']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.02.14
[debug] Git HEAD: e3b6c96
[debug] Python version 3.5.2 - Darwin-15.6.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 2.8.5, ffprobe 2.8.5
[debug] Proxy map: {}
[ElPais] 1487062137_075943: Downloading webpage
[ElPais] 1487062137_075943: Downloading JSON metadata
WARNING: unable to extract thumbnail URL; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
ERROR: An extractor error has occurred. (caused by StopIteration()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 369, in extract
    return self._real_extract(url)
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/elpais.py", line 69, in _real_extract
    webpage, 'title')
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 681, in _html_search_regex
    res = self._search_regex(pattern, string, name, default, fatal, flags, group)
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 666, in _search_regex
    return next(g for g in mobj.groups() if g is not None)
StopIteration
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 369, in extract
    return self._real_extract(url)
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/elpais.py", line 69, in _real_extract
    webpage, 'title')
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 681, in _html_search_regex
    res = self._search_regex(pattern, string, name, default, fatal, flags, group)
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 666, in _search_regex
    return next(g for g in mobj.groups() if g is not None)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 696, in extract_info
    ie_result = ie.extract(url)
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 375, in extract
    raise ExtractorError('An extractor error has occurred.', cause=e)
youtube_dl.utils.ExtractorError: An extractor error has occurred. (caused by StopIteration()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

If the purpose of this issue is a site support request please provide all kinds of example URLs support for which should be included (replace following example URLs by yours):


Description of your issue, suggested solution and other information

Looks like path extraction with categories doesn't work properly. When the link has subcategories such as .../programa_la_voz_de_inaki/... .../seccion_libros/... .../categoria_tecnologia/... .../categoria_ocio_y_cultura/... .../categoria_ciencia/... .../categoria_estilo_de_vida/... .../seccion_gastronomia/... .../categoria_estilo_de_vida/...

Could it be a problem with underscores (_) ?

mcerdeira commented 7 years ago

I think the extractor breaks, not caused by the url but caused for missing tags expected as: "tituloVideo" as is found in the exact line it fails:

elpais.py (line 69) title = self._html_search_regex( (r"tituloVideo\s*=\s*'([^']+)'", webpage, 'title', r'<h2 class="entry-header entry-title.*?>(.*?)</h2>'), webpage, 'title') I'll take a deeper look.

EDIT: Yes, I confirm is when trying to get the title attribute, I'll try to fix this and send a PR.

dmelladom commented 7 years ago

Thanks a lot! Diego

El 15 feb 2017, a las 22:40, Martín Cerdeira notifications@github.com escribió:

I think the extractor breaks, not caused by the url but caused for missing tags expected as: "tituloVideo" as is found in the exact line it fails: title = self._html_search_regex( (r"tituloVideo\s=\s'([^']+)'", webpage, 'title', r'<h2 class="entry-header entry-title.?>(.?)'), webpage, 'title') I'll take a deeper look.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.