Open christinklez opened 4 months ago
@christinklez You can investigate these by downloading the mapped metadata, searching for the item, and checking the is_shown_by
. For example, for the first error, I can see in the log that the mapped metadata file is 154/vernacular_metadata_2024-05-03T23:59:02/mapped_metadata_2024-05-04T00:01:07/data/80.jsonl
.
I download this file from S3 and search for oai:library.ucla.edu:ark:/21198/zz002hpngp
. The is_shown_by
for this record is https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz00280jpg . When I go to this URL in a browser, I get an info.json page. This is not actually an image, which is what is causing the content harvester to throw an error.
I'm not sure if this is a provider issue or a rikolti mapper error? I'm guessing it's a provider issue since we were able to get thumbnails for most of the items in this collection.
For collection 154, most of the objects have is_shown_by
URLs with this kind of format:
https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz00288pmz/full/!200,200/0/default.jpg
The 2 objects that are failing have these is_shown_by
URLs:
oai:library.ucla.edu:ark:/21198/zz00280jpg - https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz00280jpg oai:library.ucla.edu:ark:/21198/zz00288png - https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz00288png
If I add /full/!200,200/0/default.jpg
to the end, then I get a thumbnail image:
oai:library.ucla.edu:ark:/21198/zz00280jpg - https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz00280jpg/full/!200,200/0/default.jpg oai:library.ucla.edu:ark:/21198/zz00288png - https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz00288png/full/!200,200/0/default.jpg
Look at the vernacular metadata to review the isShownBy thumbnail URLs.
Looked at the OAI for each of these records, and confirming that they do not have the full/!200,200/0/default.jpg
at the tail of the IIIF URLs. We can plan to send a heads up to UCLA about these records.
In the meantime, we can consider ETL'ing these collections instead.
Thanks to #899, we're able to get these collections through to -stage! But these records (as expected) have broken thumbnails.
(map index 4, 5, 14, 35) https://calisphere-stage.cdlib.org/item/ark:/21198/zz002dcpng https://calisphere-stage.cdlib.org/item/ark:/21198/zz0002pngj/ https://calisphere-stage.cdlib.org/item/ark:/21198/zz002cpng1/
(map index 16, 37) https://calisphere-stage.cdlib.org/item/ark:/21198/zz00280jpg/ https://calisphere-stage.cdlib.org/item/ark:/21198/zz00288png/
(map index 5) https://calisphere-stage.cdlib.org/item/ark:/21198/zz002hpngp/
(map index 23) https://calisphere-stage.cdlib.org/item/ark:/21198/zz0025pngj/
(map index 29) https://calisphere-stage.cdlib.org/item/ark:/21198/zz0002fpng/
Mapper: oai.samvera Problem: The OAI for certain records do not include the
full/!200,200/0/default.jpg
tail in the URL construction. Adding these in (manually) point to a working image file. Observation: This seems to occur for select records that have ARKs that contain the text characterspng
orjpg
. Proposed next step: We need to ask UCLA to fix these URLs.Note: This is not harvest-stopping error, and these collections are currently on -stage! The records in question have a grey tile for a thumbnail.
To Do:
Registry ID: 153 - errors in 4 mapped index
oai:library.ucla.edu:ark:/21198/zz002dcpng
Exception: PIL.UnidentifiedImageError for calisphere-id oai:library.ucla.edu:ark:/21198/zz002dcpng: cannot identify image file '/tmp/ark%3A%2F21198%2Fzz002dcpng'
https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz002dcpng
oai:library.ucla.edu:ark:/21198/zz0002pngj
Exception: PIL.UnidentifiedImageError for calisphere-id oai:library.ucla.edu:ark:/21198/zz0002pngj: cannot identify image file '/tmp/ark%3A%2F21198%2Fzz0002pngj'
https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz0002pngj
oai:library.ucla.edu:ark:/21198/zz0002nzwn
Error downloading https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz0002nzwn/full/!200,200/0/default.jpg: 401 Client Error: Unauthorized for url: https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz0002nzwn/full/!200,200/0/default.jpg ERROR: no thumbnail found for ['image']record oai:library.ucla.edu:ark:/21198/zz0002nzwn in page 153/vernacular_metadata_2024-05-06T22:47:43/mapped_metadata_2024-05-06T22:52:54/data/98.jsonl
https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz0002nzwn/full/!200,200/0/default.jpg
oai:library.ucla.edu:ark:/21198/zz002ctjpg
Exception: PIL.UnidentifiedImageError for calisphere-id oai:library.ucla.edu:ark:/21198/zz002ctjpg: cannot identify image file '/tmp/ark%3A%2F21198%2Fzz002ctjpg'
https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz002ctjpg
oai:library.ucla.edu:ark:/21198/zz002cpng1
Exception: PIL.UnidentifiedImageError for calisphere-id oai:library.ucla.edu:ark:/21198/zz002cpng1: cannot identify image file '/tmp/ark%3A%2F21198%2Fzz002cpng1'
https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz002cpng1
Registry ID: 154 - errors in 2 mapped index
oai:library.ucla.edu:ark:/21198/zz00280jpg
PIL.UnidentifiedImageError for calisphere-id oai:library.ucla.edu:ark:/21198/zz00280jpg: cannot identify image file '/tmp/ark%3A%2F21198%2Fzz00280jpg'
https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz00280jpg
oai:library.ucla.edu:ark:/21198/zz00288png
PIL.UnidentifiedImageError for calisphere-id oai:library.ucla.edu:ark:/21198/zz00288png: cannot identify image file '/tmp/ark%3A%2F21198%2Fzz00288png'
https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz00288png
Registry ID: 28108 - errors in 1 mapped index
oai:library.ucla.edu:ark:/21198/zz002hpngp
PIL.UnidentifiedImageError for calisphere-id oai:library.ucla.edu:ark:/21198/zz002hpngp: cannot identify image file '/tmp/ark%3A%2F21198%2Fzz002hpngp'
https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz002hpngp
Registry ID: 28111 - errors in 1 mapped index
Run ID: manual__2024-05-04T00:00:02+00:00
oai:library.ucla.edu:ark:/21198/zz0025pngj
PIL.UnidentifiedImageError for calisphere-id oai:library.ucla.edu:ark:/21198/zz0025pngj: cannot identify image file '/tmp/ark%3A%2F21198%2Fzz0025pngj'
https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz0025pngj
Registry ID: 28230 - error in 1 mapped index
oai:library.ucla.edu:ark:/21198/zz0002fpng
Exception: PIL.UnidentifiedImageError for calisphere-id oai:library.ucla.edu:ark:/21198/zz0002fpng: cannot identify image file '/tmp/ark%3A%2F21198%2Fzz0002fpng'
https://iiif.library.ucla.edu/iiif/2/ark%3A%2F21198%2Fzz0002fpng