Open DatDraggy opened 6 years ago
You should use a User Agent that is unlikely to get banned, like a Firefox user agent, it should finally fix this recurrent issue. I did this when I had the issue on my project (https://github.com/Viljo-Kovanen/e621-apy/commit/3dc2ea185d4afd55f8b0f546ddff905d5df91fc1)
No. That is not allowed.
Not allowed by who ? e621 ?
Exactly. I'm currently trying to check the http code and header before requesting the next url. The http code is 429 when it's being rate limited and "Retry-After: X" is the header. With that I would be able to add a timeout which would wait until it's not being rate limited anymore. Ugly but would work for now
Oh my bad, I never noticed this requirement.
It's fine. Now you know ^^
So, my idea would be this: https://github.com/wwyaiykycnf/e621dl/pull/45/commits/22538640386263508379d2df71352b38fdd252e8
Any idea on how to do that? (Also, for some reason the tabbing messed something up there)
EDIT: Well that wasn't hard.. Now is the question, does it work? https://github.com/wwyaiykycnf/e621dl/pull/45/commits/1b6e98a3d2757fa2fe856ffd98522fff34f408fd
Do you know what the Retry-After is that e621 has set? The standard value for it is an hour, which is a really long time to just let the program sleep.
If not sleep then we have to redo the script so that it does a lot fewer requests or add a general sleep. Could also do: One request, download pics from that request, next request. I don't know how long it is. I'm gonna ask if I could get some info on that
Edit: 30 requests per minute are allowed apparently https://e621.net/forum/show/79955#p80057 . So we could restrict it to 0.5 requests a second
Good. They are saying:
That post is from four years ago. Things have changed quite a bit since then! Our current API documentation is here:
https://e621.net/wiki/show/e621:api
We’re updating it with a section specifically regarding rate limiting as I type this.
My own script does what you described, one request then download the results before requesting again, and it seems to be working well.
Let's see what gets added to the docs.
Okay they say max 1 request per second. Then we should maybe do the request, download, request + one sec delay if the dl was faster than one sec.
It's apparently not a 429 code. Hm
The most polite way to do this would probably be:
start = time.time()
# Do the request here
time.sleep(1 + (time.time() - start))
If the server is already being overwhelmed, the program will wait the minimum one second, plus the delay it took for the response to come.
This would work for a hotfix. For further optimization I would try to download the image after the request before the sleep. Can you add something to this pull request or is that not possible? (so that you get the nice contributor badge as well) Haven't done that much colaboration on github yet.
I just remembered that this version of the script multithreads the downloads. That is probably a big reason why it keeps getting banned. If all the threads make a request at the same time it will be well over the limit of e621. My version is a single thread, which is likely why I haven't seen any ban issues.
I have no idea how collaboration on GitHub works, all of my projects are solo.
That shouldn't affect the rate limiting. I think it's only the actual api requests to get all the posts and not the downloads.
Would you agree with the comments in e621dl in the newest commit?
What has been done:
Can be merged now. Could be improved but will definitely fix the problem.
New config line: username Will be added in support and downloader to fix the agent ban, which usually resulted in a json error because no posts could be fetched.
Been a very long time since I've done python so please check my edits for issues. I tried the downloader myself with the edits and everything worked fine, but who knows, maybe I made a mistake.