Closed kelson42 closed 2 years ago
Similar scenario for Wikihow PT, just dies at the very end https://farm.openzim.org/pipeline/e52110095e40a4ec5c222326/debug
Nothing like “just dying at the very end”. What made you think that?
We're hitting wikihow's traffic protection mechanism: when getting too many requests, the server stops accepting connections so we're getting timeouts.
The code which ran PT (I changed it between ES and PT) already has a large number of retries (10) with a large increasing sleep time in between (30s * attempt-nb). This means that for this error to happen, the scraper would have attempted retries for more than 5mn.
As we don't know how the server is configured, it's hard to guess what we should be doing exactly.
I'd advise we un-requests all wikihows and switch one to 2s interval and see what happens. What do you think?
@rgaudin Agree
WikiHow ES passes with 2/2 delays
https://farm.openzim.org/pipeline/2e20106943b8f39a67434326