Extractor error caused by StopIteration() at epv.elpais.com

dmelladom commented 7 years ago

Please follow the guide below

You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
Put an x into all the boxes [ ] relevant to your issue (like that [x])
Use Preview tab to see how your issue will actually look like

Make sure you are using the latest version: run `youtube-dl --version` and ensure your version is 2017.02.14. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.

[X] I've verified and I assure that I'm running youtube-dl 2017.02.14

Before submitting an issue make sure you have:

[X] At least skimmed through README and most notably FAQ and BUGS sections
[X] Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

[X] Bug report (encountered problems with youtube-dl)
[ ] Site support request (request for adding support for a new site)
[ ] Feature request (request for a new functionality)
[ ] Question
[ ] Other

The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue

If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add -v flag to your command line you run youtube-dl with, copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

youtube-dl -v http://epv.elpais.com/epv/2017/02/14/programa_la_voz_de_inaki/1487062137_075943.html

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'http://epv.elpais.com/epv/2017/02/14/programa_la_voz_de_inaki/1487062137_075943.html']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.02.14
[debug] Git HEAD: e3b6c96
[debug] Python version 3.5.2 - Darwin-15.6.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 2.8.5, ffprobe 2.8.5
[debug] Proxy map: {}
[ElPais] 1487062137_075943: Downloading webpage
[ElPais] 1487062137_075943: Downloading JSON metadata
WARNING: unable to extract thumbnail URL; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
ERROR: An extractor error has occurred. (caused by StopIteration()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 369, in extract
    return self._real_extract(url)
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/elpais.py", line 69, in _real_extract
    webpage, 'title')
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 681, in _html_search_regex
    res = self._search_regex(pattern, string, name, default, fatal, flags, group)
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 666, in _search_regex
    return next(g for g in mobj.groups() if g is not None)
StopIteration
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 369, in extract
    return self._real_extract(url)
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/elpais.py", line 69, in _real_extract
    webpage, 'title')
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 681, in _html_search_regex
    res = self._search_regex(pattern, string, name, default, fatal, flags, group)
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 666, in _search_regex
    return next(g for g in mobj.groups() if g is not None)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 696, in extract_info
    ie_result = ie.extract(url)
  File "/usr/local/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 375, in extract
    raise ExtractorError('An extractor error has occurred.', cause=e)
youtube_dl.utils.ExtractorError: An extractor error has occurred. (caused by StopIteration()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

If the purpose of this issue is a site support request please provide all kinds of example URLs support for which should be included (replace following example URLs by yours):

This subdomain doesn't work: http://epv.elpais.com/epv/2017/02/14/programa_la_voz_de_inaki/1487062137_075943.html
General subdomains work ok: http://internacional.elpais.com/internacional/2017/02/14/actualidad/1487074962_250923.html http://ccaa.elpais.com/ccaa/2017/02/14/valencia/1487060076_630475.html http://economia.elpais.com/economia/2017/02/14/actualidad/1487080813_927706.html http://politica.elpais.com/politica/2017/02/14/actualidad/1487060769_860194.html http://cultura.elpais.com/cultura/2017/02/14/actualidad/1487064940_837614.html http://deportes.elpais.com/deportes/2017/02/13/champions/1487011293_078589.html http://elpais.com/elpais/2017/02/13/estilo/1487014622_897717.html Note that youtube-dl does not support sites dedicated to copyright infringement. In order for site support request to be accepted all provided example URLs should not violate any copyrights.

Description of your issue, suggested solution and other information

Looks like path extraction with categories doesn't work properly. When the link has subcategories such as .../programa_la_voz_de_inaki/... .../seccion_libros/... .../categoria_tecnologia/... .../categoria_ocio_y_cultura/... .../categoria_ciencia/... .../categoria_estilo_de_vida/... .../seccion_gastronomia/... .../categoria_estilo_de_vida/...

Could it be a problem with underscores (_) ?

mcerdeira commented 7 years ago

I think the extractor breaks, not caused by the url but caused for missing tags expected as: "tituloVideo" as is found in the exact line it fails:

elpais.py (line 69) title = self._html_search_regex( (r"tituloVideo\s*=\s*'([^']+)'", webpage, 'title', r'<h2 class="entry-header entry-title.*?>(.*?)</h2>'), webpage, 'title') I'll take a deeper look.

EDIT: Yes, I confirm is when trying to get the title attribute, I'll try to fix this and send a PR.

dmelladom commented 7 years ago

Thanks a lot! Diego

El 15 feb 2017, a las 22:40, Martín Cerdeira notifications@github.com escribió:

I think the extractor breaks, not caused by the url but caused for missing tags expected as: "tituloVideo" as is found in the exact line it fails: title = self._html_search_regex( (r"tituloVideo\s=\s'([^']+)'", webpage, 'title', r'<h2 class="entry-header entry-title.?>(.?)'), webpage, 'title') I'll take a deeper look.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

ytdl-org / youtube-dl