jikan-me / jikan

Unofficial MyAnimeList PHP+REST API which provides functions other than the official API
https://jikan.moe
MIT License
875 stars 93 forks source link

Fix episodes parsing #440

Closed nerg4l closed 2 years ago

nerg4l commented 2 years ago

Pagination on MAL could include an unescaped < which causes the crawler to create an incorrect HTML DOM. This PR contains a change which handles both the correct and the incorrect DOM.

Original DOM from MAL:

<div class="pagination ac">
    <a class="link" href="https://myanimelist.net/anime/516/Keroro_Gunsou/episode?">1 - 100
    </a><span class="skip"><</span>
    <a class="link" href="https://myanimelist.net/anime/516/Keroro_Gunsou/episode?offset=100">101 - 200</a>
    <a class="link" href="https://myanimelist.net/anime/516/Keroro_Gunsou/episode?offset=200">201 - 300</a>
    <a class="link current" href="https://myanimelist.net/anime/516/Keroro_Gunsou/episode?offset=300">301 - 358</a>
</div>

Crawler DOM:

<div class="pagination ac">
    <a class="link" href="https://myanimelist.net/anime/516/Keroro_Gunsou/episode?">1 - 100</a>
    <span class="skip">
        <a class="link" href="https://myanimelist.net/anime/516/Keroro_Gunsou/episode?offset=100">101 - 200</a>
        <a class="link" href="https://myanimelist.net/anime/516/Keroro_Gunsou/episode?offset=200">201 - 300</a>
        <a class="link current" href="https://myanimelist.net/anime/516/Keroro_Gunsou/episode?offset=300">301 - 358</a>
    </span>
</div>

Fixes #439