Closed AgostinoSturaro closed 1 year ago
If this has actually been implemented (I didn't even notice until you pointed it out), I think it's being done by giving the user the "overloaded" status message and delaying the relevant captures so that (without authentication) about 4 to 5 captures are processed each minute, or about 270 captures per hour. You can definitely still make more than 4 requests per minute without authentication, and they just get delayed by the server automatically, although there is also still a limit on concurrent captures that can kick in.
I think it would make the most sense to take this into account by properly handling the edge case where you have to wait more than 10 minutes for the capture to start, i.e. #19.
The "overloaded" message is a bit different. If I recall correctly, the user limit says something about "you" having reached your limit of something. Then there's yet another message, about the archive having received too many requests for a specific website, like github.com
So, it's 3 different things:
For the general overloading, there's an API, mentioned in the changelog
2022-04-05 New API endpoint http://web.archive.org/save/status/system to notify applications if SPN is overloaded.
For the specific website, see page 9 of the API doc I don't know the error code
Artificial delays for multiple concurrent captures on the same host. When we run more than 20 concurrent captures on the same host, we introduce an artificial delay on subsequent captures to avoid overloading the target and blocking SPN2. The delay algorithm is: When concurrent_capture_number > 20 for the same host, delay concurrent_capture_number/5 sec. For example: if concurrent_capture_number = 50, delay a new capture by 50/5 = 10 sec.
For the capture limit, it's measured over the last 60 seconds, as stated on page 11 of the API doc
By “concurrent captures”, we mean captures performed in the last 60 sec.
Quote from the Save Page Now changelog
2023-01-22 The user rate limit mechanism changed from counting concurrent captures to limiting captures per minute. Anonymous users can do 4 captures per minute and authenticated users can do 12 captures per minute
It's "captures per minute", measured over the last 60 seconds, see page 11 of API doc
However, it is now much easier to hit this limit. Can you take this into account? Thank you.