jklmli / manga_downloader

Cross-platform, multi-site, multi-threaded manga downloader with over 5000 distinct mangas. Includes support for automated downloading via external .xml file and conversion for viewing on the Kindle.
MIT License
270 stars 53 forks source link

Mangafox not working #87

Closed Rishabh4275 closed 7 years ago

Rishabh4275 commented 7 years ago

It downloads but there are no images.

CharlieCorner commented 7 years ago

Hi @Rishabh4275 can you please elaborate on what you are seeing and under what conditions you are seeing this? What OS/Python version/Branch/Commit are you using? Is there any exception being printed to the console?

Also, if you could provide the test case with which you are having problems that would help document the issue better.

I haven't checked this into full detail, but I tried with Chapter 670 of Bleach and I'm seeing long download times; I'm also getting an exception saying that manga_downloader couldn't download the images, so I'm guessing the download is timing out.

It is worth mentioning that in MangaFox's forums it looks like even when using their web interface people are having problems with loading images.

Here's an example of a chapter breaking on their site

CharlieCorner commented 7 years ago

After taking a deeper look at this I can confirm that while Mangafox is indeed experiencing problems with loading images in some chapters and some mangas, for valid chapters and valid mangas (that is, those that do load on their website) manga_downloader is not downloading the images, and in fact, trying to fetch the images is timing out.

This has to do with the way we are looking in the HTML source code of the page for the image URL. We use regex to parse and look for it, and it looks like the pattern for the mangafox.py plugin is no longer valid.

The current pattern we have for Mangafox is:

re_getImage = re.compile('"><img src="([^"]*)"')

But on the actual page this is how the tag for the page image looks like; notice how there's a newline between the closing > of the a tag and the < of the img tag:

<div class="read_img"><a href="7.html" onclick="return enlarge()">
    <img src="http://h.mfcdn.net/store/manga/9/73-670.0/compressed/s001.jpg?token=372bb2d203787196b834b3c04d819077&ttl=1482973200" width="728" id="image" alt="Bleach 670: The Perfect Crimson at MangaFox.me"/>
            </a></div>              <div id="MarketGid9463" class="news-block-magick"><center><a href="http://mgid.com/" target="_blank">Loading...</a>
    </center></div>

This should be marked as a BUG, and I have a solution ready, I'll be submitting a Pull Request to fix this.