macearl / Wallhaven-Downloader

A simple download Script for Wallhaven.cc
126 stars 26 forks source link

idea: rate limit settings #35

Open paradonym opened 3 years ago

paradonym commented 3 years ago

I downloaded like 20.000 Wallpapers last night, during about 7-10 hours. Upon starting the same script the next evening I ran into rate limiting.

Is there a way to pinpoint the API-calls to exactly 45 a minute?

My script is like: WPNUMBER=96000 STARTPAGE=1407 ORDER=desc QUERY="id:***" (just with an ID providing 96k wallpapers) PARALLEL=1 THUMBS=64

This runs for about one or two pages until every line is rate limiting. But it hasn't been that way yesterday. Is that something about the API Key I have to renew?

my goal is to completely download every wallpaper of that one tag...

macearl commented 3 years ago

The limit isn't new, however it seems cloudflare is not always enforcing it. In my tests it seemed like recently requested images did not count towards the limit but only newly requested images.

With the latest update the script should retry all rate limited requests. However the current "implementation" if you even want to call it that is pretty much as basic as it gets.

Ideally the script would include additional logic to check return codes and handle them correctly and/or keep track of already made requests to not run into the rate limit at all.

As I haven't used the script myself (besides testing) since before the api existed and it does work for the occasional wallpaper download I probably won't expand it in the near future.

If you want to expand the script i would he happy to merge pull requests

paradonym commented 3 years ago

So logs like this doesn't mean that the script skips entire pages? Seems like Cloudflare blocks for a longer period, so I have to shuffle some VPN endpoints too...

Checking dependencies...OK
Download Page 1423
        - done!
Download Wallpapers from Page 1423
        Wallpaper https://w.wallhaven.cc/full/45/wallhaven-**snip**.jpg already downloaded!
        Wallpaper https://w.wallhaven.cc/full/01/wallhaven-**snip**.jpg already downloaded!
        Wallpaper https://w.wallhaven.cc/full/42/wallhaven-**snip**.jpg already downloaded!
        Wallpaper https://w.wallhaven.cc/full/ne/wallhaven-**snip**.jpg already downloaded!
        Wallpaper https://w.wallhaven.cc/full/0j/wallhaven-**snip**.jpg already downloaded!
        Wallpaper https://w.wallhaven.cc/full/4y/wallhaven-**snip**.jpg already downloaded!
        Wallpaper https://w.wallhaven.cc/full/43/wallhaven-**snip**.jpg already downloaded!
        Wallpaper https://w.wallhaven.cc/full/nk/wallhaven-**snip**.jpg already downloaded!
        Wallpaper https://w.wallhaven.cc/full/4y/wallhaven-**snip**.jpg already downloaded!
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
        - done!
Download Page 1424
        - done!
Download Wallpapers from Page 1424
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
         -Rate Limiting detected, sleeping for 30 seconds
        - done!
Download Page 1425
        - done!
Download Wallpapers from Page 1425
         -Rate Limiting detected, sleeping for 30 seconds

I now have to find the page the script last stopped on, that's why it needs to scrape a few hundred pages to get to the right point.

oh - it seems it is actually downloading stuff in between the rate limiting messages...

macearl commented 3 years ago

Yeah currently if the status code equals HTTP 429 (too many requests) it prints the rate limit message and sleeps for 30 seconds and then tries again. If the second try is also rate limited it will wait and try again for a third time and so on. So if cloudflare does block you for longer or even indefinitely it is possible that the script enters an endless loop.