Fixes #87: Update the re_getImage pattern to fetch the image URL for MangaFox

CharlieCorner commented 7 years ago

This fixes #87

The current pattern we have for Mangafox is:

re_getImage = re.compile('"><img src="([^"]*)"')

But on the actual page this is how the tag for the page image looks like; notice how there's a newline between the closing > of the a tag and the < of the img tag:

<div class="read_img"><a href="7.html" onclick="return enlarge()">
    <img src="http://h.mfcdn.net/store/manga/9/73-670.0/compressed/s001.jpg?token=372bb2d203787196b834b3c04d819077&ttl=1482973200" width="728" id="image" alt="Bleach 670: The Perfect Crimson at MangaFox.me"/>
            </a></div>              <div id="MarketGid9463" class="news-block-magick"><center><a href="http://mgid.com/" target="_blank">Loading...</a>
    </center></div>

We're now searching for img tags that have an id="image" which is what Mangafox is using to identify their pages on their website.

jklmli commented 7 years ago

Nice! I've long suspected that some of the regexes have grown stale. This is something good (and working >.>) CI would catch - kicking off a run twice a week can easily prevent this.

CharlieCorner commented 7 years ago

By the way, commit 76f7ad4 also included in this Pull Request fixes #89 . I forgot to mention this on the original post.

jklmli / manga_downloader

Fixes #87: Update the re_getImage pattern to fetch the image URL for MangaFox #88