Open anjackson opened 1 year ago
Yeah, some unbounded CDX calls. We can add hard upper limit, e.g.
export UKWA_INDEX="${CDX_SERVER}?url={url}&closest={closest}&sort=closest&filter=!statuscode:429&filter=!mimetype:warc/revisit&limit=100000"
...which works okay. But it'd be good if it was a bit cleverer.
Weird, that had some odd consequences. It seems pages like this:
Was redirected to the first (2013) version?! So more testing on BETA needed!
I think this is a case where we could use PyWB/Webrecorder's advice to work out what to do.
Some URLs, e.g.
Hangs and then fails, like this:
Trace leads to...
https://github.com/webrecorder/pywb/blob/83b2113be2c2574ec120ba292006d706e3cc3d53/pywb/apps/rewriterapp.py#L739
...which indicates it's the CDX call/lookup
There are a LOT of instances of URLs like that. Perhaps we need to add a limit?