p0ody / ff2ebook

WIP.
http://www.ff2ebook.com
18 stars 2 forks source link

Cloudflare Fix (Bypass) to scrape Fanfiction.net (Portable Version) #33

Closed bastien8060 closed 6 months ago

bastien8060 commented 3 years ago

The Proxies aren't very fast because there is the need to cycle through them (eg. not just keep using one that showed to be fast) in order to prevent them from being banned by CloudFlare. However, a lot of work has been put through optimization and cache to quicken requests.

The curl command has been replace by a self made curl function in Python that has the ability to keep the same cookies for the matching proxy along with its user agent for up to 5 requests. Then it cycles to the next proxy. It has a timeout of 5s. If it exceeds, or if the IP is banned by CloudFlare, it moves on the next Proxy in list.

It has Headless Javascript onboard (Using CloudScraper) which can solve/pass most basic and common CloudFlare Puzzle & Captcha (eg. the 5 Seconds wait page, here)

Requirements

Todo in the future:

Resolved Issues

31

30

27

ninjamouse107 commented 3 years ago

for layman's terms (those of us who can't code), what do we do?

bastien8060 commented 3 years ago

@ninjamouse107 Well, my fix used to work until Cloudflare (the company FanFiction.net pays to secure their website) changed the way their IUAM protection works (I'm Under Attack Mode). It is meant to filter out bots. Good bots and bad bots. Bots like those that PM people for scams.

The problem is that FF2Ebooks also gets blocked so people try to find a way around CloudFlare.

Simple answer for now, just wait. I might have an idea or two, I dont see why it wouldnt work. At least not just yet. Ill try that tomorrow.

ninjamouse107 commented 3 years ago

ohh ok. thanks though! unfortunately that happens to alot of really good sites

jdtrower commented 3 years ago

@bastien8060 I wonder if investigating how the application FanFictionDownloader (current version 0.9.4 - updated 8 January 2021 and available on Windows, Mac, and Linux OSes) resolved the issue surrounding CloudFlare would help lead to a workable solution. The application is able to connect to fanfiction.net and download all of the chapters before stitching the chapters together into a single HTML document. After it stitches together a single long HTML document, it then performs the conversion to the selected output format.

bastien8060 commented 3 years ago

@jdtrower oh thats simple. They are just using a "low scale" solution. They are basically doing the same thing (and also like fanficfare) and its perfect for them because they dont have many users.

This merge request does the same thing however ff2ebook has the many more users and the server's ip ends up being banned. Rotating proxies are not a reliable solution either.

bastien8060 commented 3 years ago

@jdtrower im also investingating into this: https://yawk.at/ffn-android-reversing/

It still wasnt patched when i tried it though I'm having difficulaties finding additional information about encryption like the 16byte IV and some more.

Edit: the api endpoint is undocumented but its not served under cloudflare!

jdtrower commented 3 years ago

@bastien8060 Gotcha. That makes sense. Just thought I'd point it out in case it helped lead to a solution. I prefer using ff2ebook since I can do it directly on my mobile devices and then directly download and open the result into my preferred reader whereas, with FacFictionDownloader I have to do it from a computer. An acceptable temporary workaround, but not a preferred long-term solution. Definitely appreciate the work and research you are doing on this.

bastien8060 commented 3 years ago

@jdtrower have you tried my fork test? Its https://ff2.theyoungappy.com

This version supports encryption (server supports https) however books are not stored (there is no archive). Fanfictionnet does work with a slight error rate. If it fails, just try again few times. If it really doesnt work, try again later/tell me on github.

Also check if your fanfiction is available on https://Fictionhunt.com (massive archive of fanfiction.net). if so, i might be able to pull something you can use tomorrow to download stories on fictionhunt.

bastien8060 commented 3 years ago

Edit: just downloaded 11chapters in 10seconds