hydrusnetwork / hydrus

A personal booru-style media tagger that can import files and tags from your hard drive and popular websites. Content can be shared with other users via user-run servers.
http://hydrusnetwork.github.io/hydrus/
Other
2.37k stars 156 forks source link

downloading from sankaku needs to be set on a timer. #75

Closed CuddleBear92 closed 8 years ago

CuddleBear92 commented 8 years ago

downloading from sankaku needs to be set on a timer. its flagging the anti leeching parts of the site. the site throws an error when its accessed too many times, this even shows in the Hydrus logs. what can be done is that either if when the error happens and the program gets locked out from the site it will wait 1-5min till it starts again. or it could have a timer on each file it accesses on that site so it isnt an issue at all. annoying when you find out that hundreds of files might have failed because of this and then has the be reset when it should be done a long time ago.

HASJ commented 8 years ago

Sankaku is one piece of shit site, I'll tell you that... Gelbooru is much smaller and handles bandwidth much better.

sttollgrin commented 8 years ago

And yet gelbooru is the one that is much harder to handle. No API ("Our API is being abused at the moment and is disabled."), random "intermission ads" that must be handled, silently redirecting to front page on 404... boy, I had so much fun trying to scrape anything useful from them...

Content-wise, gelbooru wins. As for how the content is delivered - sankaku bullshit is much easier to handle than gelbooru bullshit.

Anyway, sankaku returns '429 Too Many Requests' on too many requests, from my experience waiting 15-30s after receiving 429 seems to do the job.

HASJ commented 8 years ago

I'd suggest just adding a 30s wait to everything Sankaku. There have been situations where I'm downloading from Sankaku, receive a 429, wait 30, ok, receive a 429 and then every image request after it, all I get is 429 for at least one hour. A 30 second wait after every image will increase the time to download but it's much better than hour long pauses.

CuddleBear92 commented 8 years ago

you dont really need that much of a wait timer per image, 10sec should be more than enough i would think. its all about the amount of connects you make within a certain time. it does not matter how many you do aslong as its paced out right over time.

i do agree that it isnt the best site at all. but i still use it as i find to find more content there alot of the time than the gel and dan boorus.

what i really miss from this program tho is to be able to download from more than one site from within one tab. therefor downloading from more than one site about the same tag (like how danbooru downloader does it).

like searching for one artist/creator tag across 3 or more sites at the same time. nothing really stops us from doing that as its different servers, and hashing would take care of alot of the dupes aswell i guess (tho a dupe manager is needed) i guess i can make an own issue about this if this is something that sounds good :D

CuddleBear92 commented 8 years ago

one thing i also find REALLY annoying with sankaku and hydrus is if you get detected for leeching when getting links/looking for files it would give you and error on the tag searched. therefor stopping the tag search and removing it form the job queue aswell as moving down to the next one in the queue and doing the same again...... meaning it would wipe your job queue tags completely making hydrus having issues with the site.

i feel this site do have alot of more content with alot of the tags i like and so on over gel- and danbooru.

hydrusnetwork commented 8 years ago

In a future networking engine iteration, I plan to add comprehensive per-domain bandwidth rules, things like 'don't get more than 20MB per hour from this site' and 'put 25s delay between every query on this other site' to satisfy these specific problems.

For now, please go file->options->downloading and set the 'polite wait' to 40 seconds or so. It will make the client downloaders pause for that period after every gallery or page query. It isn't foolproof, and it is program-wide, but I hope it can tide you over until I can roll out a better system for this.

CuddleBear92 commented 8 years ago

thanks for the reply. i hope the per setting domain will be able to an time input aswell so users can try and test out what works best for them. or there should be some mayor testing about these timings and make it a standard on per downloader.