ucldc / rikolti

calisphere harvester 2.0
BSD 3-Clause "New" or "Revised" License
7 stars 3 forks source link

Investigate issue with UCR Library encountering EZID errors #1158

Open aturner opened 4 days ago

aturner commented 4 days ago

FD: https://help.oac.cdlib.org/a/tickets/152504

==

Hello, I'm investigating the results of our latest link checker report from EZID, which contained 225 objects that should have been available in Calisphere. I've attached a spreadsheet of the report.

The majority (or all) of the items are from the Sherman Indian Museum collection (https://calisphere.org/collections/27124/), and the report includes three different types of errors. For the three objects with the "TimeoutError: The read operation timed out" error and the handful that I spot-checked among the "500 Internal Server Error" errors, the object will display in the collection's Calisphere search results when I search by its ARK ID, but clicking on the object directs to a Calisphere page stating "An unexpected error has occurred. Error 500: Internal Server Error." Please see attached screenshots [1] [2] where I searched for ark:/86086/n2028t2h and then clicked on the object, but https://calisphere.org/item/ark:/86086/n2028t2h/ is an error page.

Is this something that CDL can fix, or do we need to address the issue on our end (in Nuxeo or elsewhere)?

Additionally, 7 objects have the "404 Not Found" error. They are all part of Box 9 in the collection, stored in this Nuxeo subfolder: /asset-library/UCR/SpecialProjects/SIM/box_009

Did someone request that these objects be removed from Calisphere? I don't see anything in Nuxeo that suggests why they may not have been (re)harvested and published.

Thank you very much for your help!

aturner commented 3 days ago

Moving this card to "review in progress," for the fix to the issue with some items resulting in 500 errors (https://calisphere.org/item/ark:/86086/n2n017p3/ has thumbnail.path=None, which had downstream effects on related objects in the collection).

The objects in /asset-library/UCR/SpecialProjects/SIM/box_009 were created in 2019 and somehow never successfully harvested. Reinstated a card for re-harvesting the collection, pending the approach of querying/retrieving directly at the Nuxeo DB level (https://github.com/ucldc/archives_ops/issues/72)