Open benoit74 opened 11 months ago
FYI, I finally have a repro of #387, but this is way better handled as stated in this issue:
Retry-After
header with a decent value of 60 seconds, which progressively decreases (59 secs, 57 secs, ...) as the crawler does not respect this parameterI'm working on a PR, so you could assign me this issue.
The crawler should behave more appropriately when it is encountering
HTTP 429 - Too Many Requests
errors.Below is an example log where the website requested the scraper to slow-down but the crawler continued to proceed at the same pace.
Sample website where it happens after some times (happening after more or less 1 hour) : https://radiopaedia.org
Logs capture
The crawler could be enhanced by:
HTTP 429
errors, and in such situation waiting some time (configurable) before continuingHTTP 429
errors (the page is available, the website just asked us to slow down)Retry-After
indicating how long the user agent should wait, could be great to use themHTTP 429
errors and finishing the crawl early if too many of them have been returned in a row (configurable), to not continue to overwhelm a website