pulibrary / figgy

Valkyrie-based digital repository backend.
Other
36 stars 4 forks source link

Generate a report of all resources that have an MMS-ID and an ARK that points to finding aids #6343

Closed tpendragon closed 5 months ago

tpendragon commented 5 months ago

Implementation Tips

You don't have to follow the redirect for the ARK, just see what the location header says when you get the ARK.

There's a Report Generator that can create CSVs from a query, might be useful here.

Part of #6262

Success Criteria

A CSV is attached to this ticket, each row has the ID of the resource, the title, the MMS-ID, the ARK, and the finding aids URL the ARK points at.

First Step

Write a query that gets all resources that have an MMS-ID.

Figure out how to stub webmock so it replicates a redirect.

hackartisan commented 5 months ago

details of alma ids: https://pul-confluence.atlassian.net/wiki/spaces/ALMA/pages/1770185/Alma+System+Numbers

Also, here's a way to get the target via the ezid client:

> ezid = Ezid::Identifier.find("ark:/88435/f1881r915")
I, [2024-04-18T09:19:48.047067 #36039]  INFO -- : EZID GetIdentifierMetadata -- success: ark:/88435/f1881r915
=> #<Ezid::Identifier id=ark:/88435/f1881r915>
> ezid.target
=> "http://findingaids.princeton.edu/collections/MC019/c01058"
hackartisan commented 5 months ago

possible regexes to use for a query:

The risk of false positives / negatives either way is probably negligibly small, especially for this use case.

hackartisan commented 5 months ago

Ran the report on prod, it was pretty slow and eventually crashed with

rake aborted! Faraday::ConnectionFailed: Failed to open TCP connection to n2t.net:443 (getaddrinfo: Temporary failure in name resolution)

but it did get 520 objects in the report (attached) before it crashed. @tpendragon do we need to be sure we got every object? If so I will add some more error handling and a progress bar.

ark_mismatch_report.csv

tpendragon commented 5 months ago

That seems like enough to get a read on the situation.