nianeyna / ao3downloader

Utility for downloading fanfiction in bulk from the Archive of Our Own
GNU General Public License v3.0
201 stars 18 forks source link

Script broken with new Cloudflare protections #96

Closed Kyther closed 1 year ago

Kyther commented 1 year ago

No idea if this is fixable given the protections are trying to stop a DDoS attack, but I decided to test with a small fandom to see if it would be able to download anything.

I get:

'NoneType' object has no attribute 'find'

And I'm bumped back to the main menu of the script.

nianeyna commented 1 year ago

Yeah, I was wondering if that would be the case. I think I'm going to wait a bit to decide on a direction to take addressing this - I may need to confer with some folks, see what's what. For now going to put a notice in the readme, I think

WhyDidIHaveToDoThis commented 1 year ago

This is a tragic day ;( Cloudflare on ffnet and now on ao3.

Lagicrus commented 1 year ago

Try testing again? With another script that was broken due to CF, it seems to now be working fine as I suspect they lowered protections again? Maybe this will work again with no issues

nianeyna commented 1 year ago

Yes, I believe they have turned off the bot challenge, but it's still looking pretty broken to me. I am not sure if that has to do with cloudflare or if it's just general bugginess from the DDoS. I'm going to leave the notice in the readme up for now.

Kyther commented 1 year ago

I'm still having to click "I'm a human" periodically just browsing the site - no captchas, but have to click in the box - so it's definitely going to be borked until things get lowered some. I guess wait and see… Adapting the script to the protections would be possible but the rate limits are so extreme right now it would be an exercise in frustration, I think.

nianeyna commented 1 year ago

Right, I think there's not much point in trying to adapt the script at this stage - things are still changing by the minute, even if I were to fix the problem it might be broken again before I finished pushing the fix, haha. I'm keeping an eye out for developments but for now my plan is pretty much to sit tight and wait it out.

Kyther commented 1 year ago

The restrictions are definitely fluctuating. I tested it on a small fandom this evening and it actually worked! The break periods are wildly different, though - I got 17 seconds the first time and 303 the next, lol. But the automation is the important thing, for me - the time it takes isn't so big a deal because I'm not having to manually click through it all.

I'm afraid to use it on the really big one I'd had in my queue next, in case they suddenly increase the restrictions to deal with stuff and it fails mid-script. I'm hoping if that's the case, that I'd be able to choose 'y' in the "do you want to start downloading from the page you stopped on last time'. The other thought that occurred to me is this is where I might do best grabbing all links to a file and then breaking that up, feeding only a chunk at a time to the script so if it fails, I'll know where. (Or will it tell me which link it was trying to get when it failed? I've not actually tried the links to file and download links from file options.) Any ideas for the best way to proceed without risking requesting the same fics from AO3 twice? :)

verotheelf commented 1 year ago

I would be careful, I tried testing it via updates and it stalled at around the 116th fic. I checked in on it about 10 minutes later and it "finished", but I could see from the tracker than it never actually progressed any further even though it didn't give an error. Make sure the number of files downloaded match with the expected amount.

Kyther commented 1 year ago

Thanks for the warning! Other than the small fandom - which the number of files definitely matched with the general expected number - I've been grabbing all links to a file and splitting it up into txt files of 500 links apiece, then running those through the script one at a time. It's stabilized at 300 secs break every time, which is generally every 27-30 downloads. I checked the numbers I should have against what I actually do have and so far it looks like it was just one fic that didn't download all the formats I wanted. Easy to find. Will be vigilant about that in future, maybe drop to fewer than 500 links if I have a lot of trouble with that many, but given the size of the fandom I was going for, I didn't want to do too much smaller than that.