Closed glenrobson closed 3 months ago
This is caused by the same issue we were discussing on Slack with Sara, our code expects type image
to just be a single image, and anything multiple uses texts
and the <identifier>_images.zip
file construction.
Should be an easy enough fix, we can just look to see how many image files with type: original
there are in the metadata when processing an image
type record
Although looking at the item in question, I think the mediatype might be wrong, as in addition to the static images in the top level directory, there are also zipfiles with JP2s that look as if they have been processed by the BookReader code, but because the mediatype is not texts
, you can't access them in the IA interface (and we would never expose them via IIIF)
The images for these zipped items are available via Cantaloupe: https://iiif.archive.org/image/iiif/3/st-anthony-relics-01%2fASB-Consorzio_jp2.zip%2fASB-Consorzio_jp2%2FASB-Consorzio_0000.jp2/full/max/0/default.jpg
The V2 manifest issue may be related - the code downloads the item in order to open it with PIL to get dimension data etc, but it is quite naive and it ends up downloading the first original
file from the item, which here is one of the PDFs, so then PIL errors out:
[2023-12-14 11:01:28,724] ERROR in app: Exception on /iiif/2/st-anthony-relics-01/manifest.json [GET]
Traceback (most recent call last):
File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/flask_cors/extension.py", line 165, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/home/mike/projects/iiif.archive.org/iiify/app.py", line 208, in manifest2
return ldjsonify(create_manifest(identifier, domain=domain, page=page))
File "/home/mike/projects/iiif.archive.org/iiify/resolver.py", line 208, in create_manifest
info = web.info(domain, path)
File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/iiif2/web.py", line 32, in info
w, h = Image.open(path).size
File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/PIL/Image.py", line 3283, in open
raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file 'media/st-anthony-relics-01'
This is the file that it downloads:
(venv) mike@revelator:~/projects/iiif.archive.org$ ls -lta media/st-anthony-relics-01
-rw-rw-r-- 1 mike mike 8653052 Dec 14 11:00 media/st-anthony-relics-01
(venv) mike@revelator:~/projects/iiif.archive.org$ file media/st-anthony-relics-01
media/st-anthony-relics-01: PDF document, version 1.3
And the matching type and bytesize from the metadata:
{
"created": 1702519976,
"d1": "ia601202.us.archive.org",
"d2": "ia801202.us.archive.org",
"dir": "/22/items/st-anthony-relics-01",
"files": [
{
"name": "ASB-Consorzio.pdf",
"source": "original",
"mtime": "1702313965",
"size": "8653052",
"md5": "4f9f26a566c797410ebd05b58596e8de",
"crc32": "ad52676c",
"sha1": "6e482b42e9d670088fa4816f235345f0b195c6e8",
"format": "Image Container PDF",
"viruscheck": "1702314437"
},
<snip>
So I think I would say that we should deal with the multiple images in an image
mediatype object (regardless of whether this object should actually be texts
), but the V2 issue is a wontfix.
Duplicate of #52
The images for these zipped items are available via Cantaloupe: https://iiif.archive.org/image/iiif/3/st-anthony-relics-01%2fASB-Consorzio_jp2.zip%2fASB-Consorzio_jp2%2FASB-Consorzio_0000.jp2/full/max/0/default.jpg
Strangly these are different images to the ones that are shown on the Internet Archive page: https://archive.org/details/st-anthony-relics-01/
The fix for multiple files seems to have worked for this one but Cantaloupe doesn't like the image files:
Failed to get https://iiif.archive.org/image/iiif/3/st-anthony-relics-01%2fStAnthony-Relics_01.jpeg Failed to get https://iiif.archive.org/image/iiif/3/st-anthony-relics-01%2fStAnthony-Relics_02.jpeg Failed to get https://iiif.archive.org/image/iiif/3/st-anthony-relics-01%2fStAnthony-Relics_03.jpeg
Which returns "Unsupported source format". The following two work OK:
https://iiif.archive.org/image/iiif/3/st-anthony-relics-01%2FAuronzo-ComuneCortina.jpeg https://iiif.archive.org/image/iiif/3/st-anthony-relics-01%2FCadore-Becher_1998.jpg
Re-target as a cantaloupe issue.
This is a very complicated example of:
https://github.com/internetarchive/iiif/issues/12
It contains a number of PDF documents and jpg images. Currently only the PDFs are shown in the Internet Archive viewer.
This came up in the IIIF training:
https://archive.org/details/st-anthony-relics-01/
It contains 5 images but the v3 manifets contains 1 image:
https://iiif.archive.org/iiif/3/st-anthony-relics-01/manifest.json
and the v2 manifest doesn't work:
https://iiif.archive.org/iiif/2/st-anthony-relics-01/manifest.json