hentai-chan / hentai

Implements a wrapper class around nhentai's RESTful API.
https://www.hentai-chan.dev/projects/hentai
GNU General Public License v3.0
211 stars 20 forks source link

Module do not work anymore #155

Closed FloRRenn closed 2 years ago

FloRRenn commented 2 years ago

I got a error like this when try to execute a code

Exception has occurred: RetryError
HTTPSConnectionPool(host='nhentai.net', port=443): Max retries exceeded with url: /random (Caused by ResponseError('too many 503 error responses'))

This is my code

from hentai import Hentai, Format

doujin = Hentai(177013)
print(doujin.url)
hentai-chan commented 2 years ago

Hi, thanks for filing this issue. Can you still access the website through the browser?

Edit: I now received a couple of reports via mail that mentioned something similar happening so this is now a confirmed bug. Instead of responding to each and everyone of them I will try to make sense of it but at the end of the day, there's only so much I can do about it since I'm not in control of the backend. If we're lucky this could also only be a temporary issue. Also, if anyone knows how to reach the site admin that would be great - ideally I want to find a solution that works for everyone.

hentai-chan commented 2 years ago

tl;dr nothing I can fix


After digging into the issue I found out that the nhentai site added cloudflare protection to their website. This is not limited to their API only, when you fire up https://nhentai.net for the first time you will see the cloudflare protection for a few seconds, so pretending to be someone else with a fake UA string or full request header won't do much here either since that's not the only thing cloudflare checks against. By the way ,we are not the only people that face this issue, pretty much service that consumes this API will experience something similar [1] [2]. After some more research I tried a few libraries that promised to circumvent cloudflare protection,

#!/usr/bin/env python3

import cloudscraper

settings = { "browser": "chrome", "platform": "android", "desktop": False }
scraper = cloudscraper.create_scraper(browser = settings)
response = scraper.get("https://nhentai.net/api/gallery/177013").text

print(response) # nope

but even if it did work I would not have been fond of this solution. In my opinion trying to play this game to trick out cloudflare is a lost cause because the site admin can decide to shut down the API at any time. According to other developers this protection may be turned off again after a few days. In my opinion, the best solution would be to add an OAuth2 protocol to the API in combination with a rate limit in order to combat malicious behavior such as scraping the entire catalog, but this is a process the site admin would have to kick off. I think they mentioned a few years ago that they had plans to add authentication to their API but they never got around implementing it. In the past, the API has also been shut down occasionally.

Long story short, while there probably are short-term solution to solve this problem [3], I don't think it's wort the time and effort if it doesn't address the underlying issues. Most if not all language bindings have the same problem, but I will keep my eyes around to see if the protection got turned off again or somebody else found a good solution.

[1] https://github.com/DiamondMiner88/nhentai/issues/14 [2] https://github.com/andy840119/NHentaiAPI/issues/12 [3] https://github.com/FlareSolverr/FlareSolverr

PS: I don't think many users of this library are aware of this feature but there actually is an alternative way to instantiate a Hentai object, namely by feeding the JSON data to the constructor:

from hentai import Hentai

response: dict = None # find a way to obtain the JSON data from the /api/gallery/id endpoint
doujin = Hentai(json=response)

This feature was originally implemented as some sort of caching, i.e. if you stored the JSON with the export method using options=[Option.Raw], this would make it possible to regain access to the properties without making unnecessary requests, which in turn would making parsing this data easy again.

FloRRenn commented 2 years ago

tl;dr nothing I can fix

After digging into the issue I found out that the nhentai site added cloudflare protection to their website. This is not limited to their API only, when you fire up https://nhentai.net for the first time you will see the cloudflare protection for a few seconds, so pretending to be someone else with a fake UA string or full request header won't do much here either since that's not the only thing cloudflare checks against. By the way ,we are not the only people that face this issue, pretty much service that consumes this API will experience something similar [1] [2]. After some more research I tried a few libraries that promised to circumvent cloudflare protection,

#!/usr/bin/env python3

import cloudscraper

settings = { "browser": "chrome", "platform": "android", "desktop": False }
scraper = cloudscraper.create_scraper(browser = settings)
response = scraper.get("https://nhentai.net/api/gallery/177013").text

print(response) # nope

but even if it did work I would not have been fond of this solution. In my opinion trying to play this game to trick out cloudflare is a lost cause because the site admin can decide to shut down the API at any time. According to other developers this protection may be turned off again after a few days. In my opinion, the best solution would be to add an OAuth2 protocol to the API in combination with a rate limit in order to combat malicious behavior such as scraping the entire catalog, but this is a process the site admin would have to kick off. I think they mentioned a few years ago that they had plans to add authentication to their API but they never got around implementing it. In the past, the API has also been shut down occasionally.

Long story short, while there probably are short-term solution to solve this problem [3], I don't think it's wort the time and effort if it doesn't address the underlying issues. Most if not all language bindings have the same problem, but I will keep my eyes around to see if the protection got turned off again or somebody else found a good solution.

[1] DiamondMiner88/nhentai#14 [2] andy840119/NHentaiAPI#12 [3] https://github.com/FlareSolverr/FlareSolverr

PS: I don't think many users of this library are aware of this feature but there actually is an alternative way to instantiate a Hentai object, namely by feeding the JSON data to the constructor:

from hentai import Hentai

response: dict = None # find a way to obtain the JSON data from the /api/gallery/id endpoint
doujin = Hentai(json=response)

This feature was originally implemented as some sort of caching, i.e. if you stored the JSON with the export method using options=[Option.Raw], this would make it possible to regain access to the properties without making unnecessary requests, which in turn would making parsing this data easy again.

thank for your supporting :))

Zack-Bloodshot commented 2 years ago

Now its working everyone! Enjoy! Again...

defnotanalt commented 2 years ago

~Now its working everyone! Enjoy!~ Again...

Assuming we're still broken and just waiting on a turn-around basically?

Zack-Bloodshot commented 2 years ago

~Now its working everyone! Enjoy!~ Again...

Assuming we're still broken and just waiting on a turn-around basically?

Yepp, I dont know if they are rate limiting or blocking the requests, once in a blue moon it works, and for other modules, they are using cookies to make it work..

hentai-chan commented 2 years ago

It’s not about requests being blocked per se, it’s just that ever since they started to turn on their cloudflare protection no REST client works reliably anymore. I don’t think cookies are the way to go either, if they want to reduce their server load they should implement OAuth2 and introduce a rate limit, but that’s a decision they have to make.

Outlook for iOShttps://aka.ms/o0ukef を入手


差出人: A B H I @.> 送信日•r: Monday, May 30, 2022 4:06:14 AM 宛先: hentai-chan/hentai @.> CC: ヘンタイちゃん @.>; Assign @.> 件名: Re: [hentai-chan/hentai] Module do not work anymore (Issue #155)

Now its working everyone! Enjoy! Again...

Assuming we're still broken and just waiting on a turn-around basically?

Yepp, I dont if they are rate limiting or blocking the requests, once in a blue moon it works, and for other modules, they are using cookies to make it work..

― Reply to this email directly, view it on GitHubhttps://github.com/hentai-chan/hentai/issues/155#issuecomment-1140605978, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQ5JLFDLNDEBPNRRHJG72DDVMQPBNANCNFSM5UZQHD2Q. You are receiving this because you were assigned.Message ID: @.***>