mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.34k stars 924 forks source link

E-hentai: TypeError during download. #97

Closed Nippey closed 6 years ago

Nippey commented 6 years ago

Hi, I got an error while downloading from e-hentai. Previous downloads of other galleries succeeded. Below is the verbose output as requested by the error message

Regards! Nippey

$ gallery-dl.exe --verbose https://e-hentai.org/g/1266008/eedea8501f/
[gallery-dl][debug] Version 1.5.0
[gallery-dl][debug] Python 3.6.3 - Windows-10-10.0.17134-SP0
[gallery-dl][debug] requests 2.19.1 - urllib3 1.23
[gallery-dl][debug] Starting DownloadJob for 'https://e-hentai.org/g/1266008/eedea8501f/'
[exhentai][debug] Using ExhentaiGalleryExtractor for 'https://e-hentai.org/g/1266008/eedea8501f/'
[exhentai][info] no username given; using e-hentai.org
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): e-hentai.org:443
[urllib3.connectionpool][debug] https://e-hentai.org:443 "GET /g/1266008/eedea8501f/ HTTP/1.1" 200 1579
[exhentai][error] An unexpected error occurred: TypeError - argument of type 'NoneType' is not iterable. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[exhentai][debug] Traceback
Traceback (most recent call last):
  File "c:\python\python36-32\lib\site-packages\gallery_dl\job.py", line 64, in run
    for msg in self.extractor:
  File "c:\python\python36-32\lib\site-packages\gallery_dl\extractor\exhentai.py", line 137, in items
    data = self.get_job_metadata(page)
  File "c:\python\python36-32\lib\site-packages\gallery_dl\extractor\exhentai.py", line 163, in get_job_metadata
    data["title"] = text.unescape(data["title"])
  File "c:\python\python36-32\lib\html\__init__.py", line 130, in unescape
    if '&' not in s:
TypeError: argument of type 'NoneType' is not iterable
Nippey commented 6 years ago

Hi, nevermind!

I fetched the URL with WGET to check what text.unescape(data["title"]) was trying to parse:
This gallery has been flagged as Offensive For Everyone

So, I follwed the recommended procedure and created a user account .. working now!

I will not close this right away, as I have a proposal:

If this kind of banner comes up, I am still able to contine by clicking "View Gallery". Maybe this can be integrated into the extractor.

Original Link: https://e-hentai.org/g/1266008/eedea8501f/ View Link (One session): https://e-hentai.org/g/1266008/eedea8501f/?nw=session View Link (Always): https://e-hentai.org/g/1266008/eedea8501f/?nw=always

Regards! Nippey

mikf commented 6 years ago

Thanks for reporting this and "investigating" a bit ... makes writing a solution a lot easier.

Nippey commented 6 years ago

If I do a proposal for free software, I might as well provide some ideas in code ;)

Basically, I think about two methods. A: Warn the user to use credentials (not tested) B: Try again by skipping the warning (tested, working for me!)

You can find both below.

@staticmethod
def _is_offensive(response):
    """Return True if the response object contains a 'fallges as offensive' warning"""
    return (
        "Content Warning" in response.text
    )

"""Option A: Just Warn about it"""
class OffensiveError(ExtractionError):
    """Gallery is flagged offensive and should be donwloaded with login credentials"""

def request(self, *args, **kwargs):
    response = Extractor.request(self, *args, **kwargs)
    if self._is_sadpanda(response):
        self.log.info("sadpanda.jpg")
        raise exception.AuthorizationError()
    if  self._is_offensive(response):
        self.log.info("Offensive Gallery: Please provice credentials to continue")
        raise exception.OffensiveError()
    return response

"""Option B: Try to skip it"""
def request(self, *args, **kwargs):
    response = Extractor.request(self, *args, **kwargs)
    if self._is_sadpanda(response):
        self.log.info("sadpanda.jpg")
        raise exception.AuthorizationError()
    if  self._is_offensive(response):
        self.log.info("Offensive Gallery: Skipping Warning")
        ## Add request flag "?nw=always" to <url> parameter
        args = list(args)
        args[0] = args[0] + "?nw=always"
        args = tuple(args)
        ## Is it okay this way??
        response = Extractor.request(self, *args, **kwargs)
    return response
Nippey commented 6 years ago

Already solved, that was fast o.o Thanks!!