wwyaiykycnf / e621dl

The automated e621.net downloader
40 stars 23 forks source link

Fixed agent ban, added username to config, timeout against rate limiting #45

Open DatDraggy opened 6 years ago

DatDraggy commented 6 years ago

New config line: username Will be added in support and downloader to fix the agent ban, which usually resulted in a json error because no posts could be fetched.

Been a very long time since I've done python so please check my edits for issues. I tried the downloader myself with the edits and everything worked fine, but who knows, maybe I made a mistake.

aleksbrgt commented 6 years ago

You should use a User Agent that is unlikely to get banned, like a Firefox user agent, it should finally fix this recurrent issue. I did this when I had the issue on my project (https://github.com/Viljo-Kovanen/e621-apy/commit/3dc2ea185d4afd55f8b0f546ddff905d5df91fc1)

DatDraggy commented 6 years ago

No. That is not allowed.

aleksbrgt commented 6 years ago

Not allowed by who ? e621 ?

DatDraggy commented 6 years ago

Exactly. I'm currently trying to check the http code and header before requesting the next url. The http code is 429 when it's being rate limited and "Retry-After: X" is the header. With that I would be able to add a timeout which would wait until it's not being rate limited anymore. Ugly but would work for now

aleksbrgt commented 6 years ago

Oh my bad, I never noticed this requirement.

DatDraggy commented 6 years ago

It's fine. Now you know ^^

So, my idea would be this: https://github.com/wwyaiykycnf/e621dl/pull/45/commits/22538640386263508379d2df71352b38fdd252e8

Any idea on how to do that? (Also, for some reason the tabbing messed something up there)

EDIT: Well that wasn't hard.. Now is the question, does it work? https://github.com/wwyaiykycnf/e621dl/pull/45/commits/1b6e98a3d2757fa2fe856ffd98522fff34f408fd

Wulfre commented 6 years ago

Do you know what the Retry-After is that e621 has set? The standard value for it is an hour, which is a really long time to just let the program sleep.

DatDraggy commented 6 years ago

If not sleep then we have to redo the script so that it does a lot fewer requests or add a general sleep. Could also do: One request, download pics from that request, next request. I don't know how long it is. I'm gonna ask if I could get some info on that

Edit: 30 requests per minute are allowed apparently https://e621.net/forum/show/79955#p80057 . So we could restrict it to 0.5 requests a second

DatDraggy commented 6 years ago

Good. They are saying:

That post is from four years ago. Things have changed quite a bit since then! Our current API documentation is here:

https://e621.net/wiki/show/e621:api

We’re updating it with a section specifically regarding rate limiting as I type this.

Wulfre commented 6 years ago

My own script does what you described, one request then download the results before requesting again, and it seems to be working well.

Let's see what gets added to the docs.

DatDraggy commented 6 years ago

Okay they say max 1 request per second. Then we should maybe do the request, download, request + one sec delay if the dl was faster than one sec.

It's apparently not a 429 code. Hm

Wulfre commented 6 years ago

The most polite way to do this would probably be:

start = time.time()
# Do the request here
time.sleep(1 + (time.time() - start))

If the server is already being overwhelmed, the program will wait the minimum one second, plus the delay it took for the response to come.

DatDraggy commented 6 years ago

This would work for a hotfix. For further optimization I would try to download the image after the request before the sleep. Can you add something to this pull request or is that not possible? (so that you get the nice contributor badge as well) Haven't done that much colaboration on github yet.

Wulfre commented 6 years ago

I just remembered that this version of the script multithreads the downloads. That is probably a big reason why it keeps getting banned. If all the threads make a request at the same time it will be well over the limit of e621. My version is a single thread, which is likely why I haven't seen any ban issues.

I have no idea how collaboration on GitHub works, all of my projects are solo.

DatDraggy commented 6 years ago

That shouldn't affect the rate limiting. I think it's only the actual api requests to get all the posts and not the downloads.

DatDraggy commented 6 years ago

Would you agree with the comments in e621dl in the newest commit?

DatDraggy commented 6 years ago

What has been done:

Can be merged now. Could be improved but will definitely fix the problem.