Closed alexwlchan closed 3 years ago
Here's a Python script that finds all the IIIF manifests in a works snapshot:
#!/usr/bin/env python
import gzip, json, httpx
def locations():
for line in gzip.open("works.json.gz"):
work = json.loads(line)
for it in work["items"]:
yield from it["locations"]
for loc in locations():
if loc["type"] == "PhysicalLocation":
continue
assert loc["type"] == "DigitalLocation"
if loc["locationType"]["id"] == "iiif-image":
continue
assert loc["locationType"]["id"] == "iiif-presentation", loc
print(loc["url"])
And once you have that list in a text file, you can check it with the following shell script
cat iiif_manifest_location_urls.txt \
| xargs -I "{}" -P 10 curl -s -o /dev/null -w "{} %{http_code}\n" "{}" > iiif_checked.txt
I'm running this and will post results when it's done.
Here's a spreadsheet with all the affected IIIF manifests, grouped by error message:
FYI Everything beyond line 34 is an error thrown by enforcing a business rule:
Closed
, the work should not be served (this could be a 404 instead)The "Sequence contains more than one matching element" are where some METS XML should be one and one element only but has more than one (throw exception rather than just use the first). And the first 6 are self-explanatory - can't find in storage.
FYI Everything beyond line 34 is an error thrown by enforcing a business rule:
Yeah, I think DLCS is doing the correct thing here – the issue is somewhere in the underlying metadata or the catalogue data, because we're presenting these manifests as items that should be visible on /works. And they're not, so either we should change the data to make them visible, or stop telling people this is possible.
I'd like to have some better real time metrics on this. We report when IIIF images are down. I'll have a look at monitoring other services in real time.
I'll fix the items that do not have a license and will look at any other outstanding bnumbers that are not closed but still are giving a 500 error from lines 1-34.
A member of staff spotted a broken item on /works yesterday; this was caused by a IIIF manifest that was returning a 500 error in DLCS. Slack discussion here: https://wellcome.slack.com/archives/C8X9YKM5X/p1610365766200100
We should check all the IIIF manifests we're serving in the catalogue API, and look for anything that's similarly broken.