pulibrary / figgy

Valkyrie-based digital repository backend.
Other
36 stars 4 forks source link

Migrating Source Metadata Identifiers from Finding Aid IDs to MMS IDs doesn't update the ARK #6262

Closed tpendragon closed 6 months ago

tpendragon commented 8 months ago

Message from Hilary Murusmith:

I belatedly realized this week that we have an issue with the ARK links for the Derrida digital content that have had the Source Metadata IDs updated to the MMS IDs and I wanted to see if we could either add it to the existing GitHub issue (https://github.com/pulibrary/figgy/issues/6232) or create a new ticket for it.

For the Derrida digital content that has had the Source Metadata IDs updated, it seems that the ARK links are unfortunately still redirecting to the component IDs in the finding aid rather than to the MMS IDs in the catalog. When I enter the URL as http://arks.princeton.edu/[ARK] into the address bar, it redirects me to the finding aid, https://findingaids.princeton.edu/catalog/[cid]. And when I use the “View digital content” link in Blacklight (which originates from the 856 field in the MARC record where we have a subfield with http://arks.princeton.edu/[ARK] for the digital content) it redirects me to https://catalog.princeton.edu/catalog/[cid]#view, which ends up resulting in a 404 error. Here are two examples:

Une migration, suivi de Le partenaire / Roger Laporte ; frontispice de Zao Wou-ki ; lettre-préf. de René Char. Blacklight link: https://catalog.princeton.edu/catalog/99125488773406421 Figgy link: https://figgy.princeton.edu/catalog/03563918-6356-4218-ace3-d7df919c5cf5 ARK: ark:/88435/3j333601q http://arks.princeton.edu/ark:/88435/3j333601q redirects to: https://findingaids.princeton.edu/catalog/RBD1-1_c4768 (where the digital content is no longer visible because the Source Metadata ID has been updated to the MMS ID) “View digital content” link in Blacklight redirects to: https://catalog.princeton.edu/catalog/RBD1.1_c4768#view (which results in a 404 error)

Systématique ouverte / Kostas Axelos. Blacklight link: https://catalog.princeton.edu/catalog/992438213506421 Figgy link: https://figgy.princeton.edu/catalog/591759e7-37b6-4a36-a760-5127f0199293 ARK: ark:/88435/sq87bz37v http://arks.princeton.edu/ark:/88435/sq87bz37v redirects to: https://findingaids.princeton.edu/catalog/RBD1-1_c292 “View digital content” link in Blacklight redirects to: https://catalog.princeton.edu/catalog/RBD1.1_c292#view

Would there be a way to batch update the URL redirects for the ARKs? I am attaching my latest CSV file again, which includes all the ARKs to date that would need to be updated. Let me know if you would need additional details for this process.

My apologies that I did not catch this issue with the ARKs sooner, having been focused on the display of the digital content viewer in Blacklight and so pleased that we could update this routinely with the progression of the cataloging.

Thank you again for how responsive you have been to the Derrida cataloging project!

I did a little early searching around. This is because we explicitly don't update ARKs if the ARKs point at finding aids:

https://github.com/pulibrary/figgy/blob/b64b039b6690d5505a4c1b37417871456b5a528b/app/services/identifier_service.rb#L41

Which seemed odd, but we added it because we wanted to make sure we didn't accidentally get rid of a findingaids pointer in an old version of Figgy:

https://github.com/pulibrary/figgy/issues/1727

I'm not sure if this is still a problem. If it isn't, then we should just get rid of that restriction. If it is, then we should add a way to bypass that requirement.

Steps

Success Criteria

All of the resources in #6232 have an ARK that points to the catalog. For example, https://arks.princeton.edu/ark:/88435/3j333601q

Sudden Priority Justification

If the ARKs for these point to the wrong place, citations for this content will go to the wrong place. The sudden priority process is the way to get these kinds of issues looked at.

tpendragon commented 7 months ago

Need to chat with the finding aids POs & the Figgy PO about this one.

The question is: If an ARK is associated with a Figgy record, is it okay to always change its destination when the record is marked complete.

tpendragon commented 7 months ago

POs have requested a report of all the resources which have MMS-ID source metadata identifiers and an ARK which points at findingaids, to be able to tell which arks would end up getting changed.

We can tell if the ARK points to findingaids by resolving the ARK. You don't have to follow the redirect - just see if it's a redirect and look at the LOCATION header.

tpendragon commented 6 months ago

The report is here: https://github.com/pulibrary/figgy/files/15180044/ark_mismatch_report.csv (thanks @hackartisan !)

Messaged @faithc and @ccleeton to see if can say if all of those ARKs can point at their catalog counterparts and that's fine.

hackartisan commented 6 months ago

The reports appear to be cumulative, so I believe we only need to run the metadata refresh on the last report. I thought all the resources were likely in the same collection, which would mean we can just use the bulk update UI for the refresh, but I did a little analysis and they are not all in the same collection.

hackartisan commented 6 months ago

I fetched all the resources IDs using the mmsids from the spreadsheet. some of them weren't found in figgy, leaving 451 resources. I tried updating one with change_set.validate(refresh_remote_metadata: "1") and saving it but that didn't update the ark. I tried that same one by using the check box in the UI and that did update the ark. I fed them all into CatalogUpdateJob but the arks didn't get updated. Finally I pulled out just the ones with state == ['complete'] and ran those through IdentifierService.mint_or_update. I spot-checked a few and this worked.

I think what tripped up the process was that the change set persister only updates identifiers when the state, title, or source metadata identifier are changed. I do not know how those values read as changed when you submit the form, but they definitely don't when you try to re-set them to their original value in the terminal.

21 resources had a state other than "complete" and as expected none of those had an ark yet in its identifier field. once those are complete they should update the ark.

Closing as fixed.