Open Suboms opened 9 months ago
this error happened because for some weird reasons AnimePahe decided it's a good idea to put a DDoS Guard on EVERY PAGES including API respond. (this is the same all for other Domains too: animepahe.ru animepahe.org animepahe.com)
when Animdl search for anime titles on the site it will made an API request with url similar to this https://animepahe.ru/api?q=Oroka%20na%20Tenshi%20wa%20Akuma%20to%20Odoru&m=search
which should get a JSON respond like this
{
"total": 3,
"per_page": 8,
"current_page": 1,
"last_page": 1,
"from": 1,
"to": 3,
"data": [
{
"id": 5442,
"title": "Oroka na Tenshi wa Akuma to Odoru",
"type": "TV",
"episodes": 12,
"status": "Currently Airing",
"season": "Winter",
"year": 2024,
"score": 6.62,
"poster": "https:\/\/i.animepahe.ru\/posters\/dedc73ea139e05bddd50651cb35112806aaf984deaea672d25093def0d2a60aa.jpg",
"session": "f115f686-4214-ee80-a402-6e801f2f6534"
},
...
]
}
unfortunately this is what we got
since httpx.get()
return page content IMMEDIATELY once the page is loaded
it got the content of the fake loading screen instead.
and when Animdl tries to parse the fake loading screen it failed with the error you've received.
so...
I'm not an expert but from what I know... not much actually. because DDoS Guards are made for this.
unless AnimePahe remove this protection this is what we can try:
this simplest method is probably Wait for the fake loading screen to go away because it will disappear and display the real content after a few seconds anyways, if we can some how send an API request, wait and evaluate page content only when the real content is displayed it could work for a bit. but it does has some flaws the first is the fake loading screen may stay chilling for too long and bypass the delay, this can be easily fixed with Headless Browser. Instead of waiting for some delay we can wait for page elements to disappear.
cookies that saved on the browser can prevent the fake loading screen from appearing the second time. we can attach this cookies with the request to trick the server that it's from the browser that has pass DDoS protection. This method is a bit advanced but probably the most efficient way. BUT those cookies has an expiration date like a real cookies, we might have to keep generating them to prevent this.
no matter what we do DDoS Guard is still active on the server, soon on later it could suspect our API request for an attack and throw everyone's favorites puzzle: captcha to us. the only problem is that Animdl can't automatically solve them and we're kinda stuck. sucks right?
PS: I'm still working this even though I'm not a contributor, because I still needs Animdl for my animes need ;) for the real reliable fix would have to leave it for someone else with much more Neuron power and time than me.
https://github.com/justfoolingaround/animdl/blob/c7c3b79198e66695e0bbbc576f9d9b788616957f/animdl/core/cli/http_client.py#L17-L20
Adding ".animepahe.ru"
to this function should fix it
utils.http_client.integrate_ddg_bypassing(
client,
".marin.moe",
".animepahe.ru",
)
I'm extremely busy rn due to university. I'll say that please refrain from posting (or even mentioning) that part of the codebase. It is an extremely unrecognized method that if fixed could sabotage a lot of scrapers.
AnimePahe is not working. Below is the error message I receive every time I use the following.