potatoeggy / mandown

Comic/manga/webtoon downloader and CBZ/EPUB/MOBI/PDF converter
GNU Affero General Public License v3.0
45 stars 7 forks source link

IndexError when querying a Webtoons comic #59

Closed karatsuh closed 1 year ago

karatsuh commented 1 year ago

I was able to successfully use mandown to fetch a manga from mangadex.org, but ran into an issue with Webtoons.com

Command entered: mandown get 'https://www.webtoons.com/en/slice-of-life/batman-wayne-family-adventures/list?title_no=3180'

Error Traceback below:

Searching sources for https://www.webtoons.com/en/slice-of-life/batman-wayne-family-adventures/list?title_no=3180
Traceback (most recent call last):

  File "/Library/Frameworks/Python.framework/Versions/3.11/bin/mandown", line 8, in <module>
    sys.exit(main())
             ^^^^^^

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mandown/cli.py", line 524, in main
    app()

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mandown/cli.py", line 385, in get
    comic = cli_query(url)
            ^^^^^^^^^^^^^^

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mandown/cli.py", line 146, in cli_query
    comic = api.query(url)
            ^^^^^^^^^^^^^^

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mandown/api.py", line 33, in query
    return BaseComic(adapter.metadata, adapter.chapters)
                     ^^^^^^^^^^^^^^^^

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mandown/sources/base_source.py", line 27, in metadata
    self._metadata = self.fetch_metadata()
                     ^^^^^^^^^^^^^^^^^^^^^

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mandown/sources/source_webtoons.py", line 42, in fetch_metadata
    authors: list[str] = feed["entries"][0].author.split("/")
                         ~~~~~~~~~~~~~~~^^^

IndexError: list index out of range
potatoeggy commented 1 year ago

Thanks for reporting your issue!

However, I can't seem to reproduce your issue — it all loads fine for me. Can you run the following in a Python shell or script? The expected output is entries in feed: True without crashing.

If you do get that result, then try running your mandown command again (maybe Webtoons updated something).

import feedparser

feed = feedparser.parse("https://www.webtoons.com/en/slice-of-life/batman-wayne-family-adventures/rss?title_no=3180")

print("entries in feed:", "entries" in feed)

assert len(feed["entries"]) > 0, "Webtoons reports that this comic does not have chapters"
assert feed["entries"][0].author, "There are no authors for this comic"
assert "channel" in feed, "Malformed RSS"
karatsuh commented 1 year ago

Thanks for the quick response! Here's my output from running that script...Strange. I was able to use another webtoon downloader.

assert len(feed["entries"]) > 0, "Webtoons reports that this comic does not have chapters" ^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Webtoons reports that this comic does not have chapters

potatoeggy commented 1 year ago

Good that you found a solution! But if you will, I'm curious about this issue now...

It's strange indeed. I guess my parsing is a little funky. Mandown uses RSS scraping for webtoons and mangasee as opposed to more conventional web scraping because I got lazy. I've been meaning to rely completely on scraping, and I guess now I have a push to do so :)

I'll let you know later today when it's fixed! In the meantime, can you do me a favour and test whether your machine can actually see the RSS file?

import requests

res = requests.get("https://www.webtoons.com/en/slice-of-life/batman-wayne-family-adventures/rss?title_no=3180") 
print(res.text)

You should see a bunch of XML.

karatsuh commented 1 year ago

Great! Thanks for looking into it. It'd be great if I could use your tool, I've been looking for one that would download and package them into CBZ files in one go.

I've got the XML successfully from your code snippet, attaching it here just in case you need it

batman_xml.txt

potatoeggy commented 1 year ago

Okay, so — the 1.3.1 release now scrapes Webtoons' webpage instead of its RSS feed. Give that a shot?

karatsuh commented 1 year ago

Hmm...now getting a 403 Error.


Searching sources for https://www.webtoons.com/en/slice-of-life/batman-wayne-family-adventures/list?title_no=3180
Found item from source Webtoons
Downloading...
  [------------------------------------]    0%
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mandown/io.py", line 35, in async_download_image
    res.raise_for_status()
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://webtoon-phinf.pstatic.net/20210908_32/1631043278314jN45V_JPEG/3BatmanFamilyAdven_desktop_thumbnail.jpg?type=a306
"""

The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mandown/cli.py:4 │
│ 06 in get                                                                                        │
│                                                                                                  │
│   403 │   │   api.download_progress(comic, dest, threads=maxthreads),                            │
│   404 │   │   length=len(comic.chapters),                                                        │
│   405 │   ) as progress:                                                                         │
│ ❱ 406 │   │   for title in progress:                                                             │
│   407 │   │   │   progress.label = title                                                         │
│   408 │   typer.secho(                                                                           │
│   409 │   │   f"Successfully downloaded {end_chapter - start_chapter} chapters.",                │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │              comic = <mandown.comic.BaseComic object at 0x108f579d0>                         │ │
│ │         convert_to = <ConvertFormats.NONE: 'none'>                                           │ │
│ │               dest = PosixPath('/Users/karafinch')                                           │ │
│ │                end = None                                                                    │ │
│ │        end_chapter = 76                                                                      │ │
│ │         maxthreads = 4                                                                       │ │
│ │ processing_options = []                                                                      │ │
│ │           progress = <click._termui_impl.ProgressBar object at 0x108cd4ad0>                  │ │
│ │       remove_after = False                                                                   │ │
│ │       size_profile = None                                                                    │ │
│ │              start = None                                                                    │ │
│ │      start_chapter = 0                                                                       │ │
│ │        target_size = None                                                                    │ │
│ │                url = 'https://www.webtoons.com/en/slice-of-life/batman-wayne-family-adventu… │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/click/_termui_im │
│ pl.py:328 in generator                                                                           │
│                                                                                                  │
│ /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mandown/api.py:2 │
│ 55 in download_progress                                                                          │
│                                                                                                  │
│   252 │                                                                                          │
│   253 │   # cover                                                                                │
│   254 │   if comic.metadata.cover_art:                                                           │
│ ❱ 255 │   │   for _ in io.download_images(                                                       │
│   256 │   │   │   [comic.metadata.cover_art], full_path, filestems=["cover"]                     │
│   257 │   │   ):                                                                                 │
│   258 │   │   │   pass                                                                           │
│                                                                                                  │
│ ╭─────────────────────────────────────── locals ────────────────────────────────────────╮        │
│ │                 comic = <mandown.comic.BaseComic object at 0x108f579d0>               │        │
│ │                   end = None                                                          │        │
│ │             full_path = PosixPath('/Users/karafinch/Batman: Wayne Family Adventures') │        │
│ │ only_download_missing = True                                                          │        │
│ │                  path = PosixPath('/Users/karafinch')                                 │        │
│ │                 start = None                                                          │        │
│ │               threads = 4                                                             │        │
│ ╰───────────────────────────────────────────────────────────────────────────────────────╯        │
│                                                                                                  │
│ /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mandown/io.py:81 │
│ in download_images                                                                               │
│                                                                                                  │
│    78 │   │   map_pool.append((url, dest_folder, f"{stem}{ext}", headers))                       │
│    79 │                                                                                          │
│    80 │   with mp.Pool(threads) as pool:                                                         │
│ ❱  81 │   │   yield from pool.imap_unordered(async_download_image, map_pool)                     │
│    82                                                                                            │
│    83                                                                                            │
│    84 def read_comic(path: Path | str) -> BaseComic:                                             │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │           _ = '/20210908_32/1631043278314jN45V_JPEG/3BatmanFamilyAdven_desktop_thumbnail'    │ │
│ │ dest_folder = PosixPath('/Users/karafinch/Batman: Wayne Family Adventures')                  │ │
│ │         ext = '.jpg'                                                                         │ │
│ │   filestems = ['cover']                                                                      │ │
│ │     headers = None                                                                           │ │
│ │    map_pool = [                                                                              │ │
│ │               │   (                                                                          │ │
│ │               │   │                                                                          │ │
│ │               'https://webtoon-phinf.pstatic.net/20210908_32/1631043278314jN45V_JPEG/3Batma… │ │
│ │               │   │   PosixPath('/Users/karafinch/Batman: Wayne Family Adventures'),         │ │
│ │               │   │   'cover.jpg',                                                           │ │
│ │               │   │   None                                                                   │ │
│ │               │   )                                                                          │ │
│ │               ]                                                                              │ │
│ │        pool = <multiprocessing.pool.Pool state=TERMINATE pool_size=1>                        │ │
│ │        stem = 'cover'                                                                        │ │
│ │     threads = 1                                                                              │ │
│ │         url = 'https://webtoon-phinf.pstatic.net/20210908_32/1631043278314jN45V_JPEG/3Batma… │ │
│ │        urls = [                                                                              │ │
│ │               │                                                                              │ │
│ │               'https://webtoon-phinf.pstatic.net/20210908_32/1631043278314jN45V_JPEG/3Batma… │ │
│ │               ]                                                                              │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py:873 in │
│ next                                                                                             │
│                                                                                                  │
│   870 │   │   success, value = item                                                              │
│   871 │   │   if success:                                                                        │
│   872 │   │   │   return value                                                                   │
│ ❱ 873 │   │   raise value                                                                        │
│   874 │                                                                                          │
│   875 │   __next__ = next                    # XXX                                               │
│   876                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │    item = (                                                                                  │ │
│ │           │   False,                                                                         │ │
│ │           │   HTTPError('403 Client Error: Forbidden for url:                                │ │
│ │           https://webtoon-phinf.pstatic.net/20210908_32/1631043278314jN45V_JPEG/3BatmanFami… │ │
│ │           )                                                                                  │ │
│ │    self = <multiprocessing.pool.IMapUnorderedIterator object at 0x10450fed0>                 │ │
│ │ success = False                                                                              │ │
│ │ timeout = None                                                                               │ │
│ │   value = HTTPError('403 Client Error: Forbidden for url:                                    │ │
│ │           https://webtoon-phinf.pstatic.net/20210908_32/1631043278314jN45V_JPEG/3BatmanFami… │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
HTTPError: 403 Client Error: Forbidden for url: https://webtoon-phinf.pstatic.net/20210908_32/1631043278314jN45V_JPEG/3BatmanFamilyAdven_desktop_thumbnail.jpg?type=a306```
potatoeggy commented 1 year ago

Oof. That's what I get for not properly testing. This one I can reproduce, but now it should be fixed. Works On My Machine™.

karatsuh commented 1 year ago

There we go! Works great now fantastic