Closed tpendragon closed 5 months ago
details of alma ids: https://pul-confluence.atlassian.net/wiki/spaces/ALMA/pages/1770185/Alma+System+Numbers
Also, here's a way to get the target via the ezid client:
> ezid = Ezid::Identifier.find("ark:/88435/f1881r915")
I, [2024-04-18T09:19:48.047067 #36039] INFO -- : EZID GetIdentifierMetadata -- success: ark:/88435/f1881r915
=> #<Ezid::Identifier id=ark:/88435/f1881r915>
> ezid.target
=> "http://findingaids.princeton.edu/collections/MC019/c01058"
possible regexes to use for a query:
LIKE '99%6421'
: this should work well if we are sure we only have bibliographic alma ids (I'm pretty sure that's the case)SIMILAR TO '[0-9]+6421'
: this will ensure numeric-only values, ending in 6421. this would allow alma ids that aren't bibliographic ids, but ensure we don't get component ids that happen to end in 6421. assuming we don't have all-numeric component ids.The risk of false positives / negatives either way is probably negligibly small, especially for this use case.
Ran the report on prod, it was pretty slow and eventually crashed with
rake aborted! Faraday::ConnectionFailed: Failed to open TCP connection to n2t.net:443 (getaddrinfo: Temporary failure in name resolution)
but it did get 520 objects in the report (attached) before it crashed. @tpendragon do we need to be sure we got every object? If so I will add some more error handling and a progress bar.
That seems like enough to get a read on the situation.
Implementation Tips
You don't have to follow the redirect for the ARK, just see what the location header says when you get the ARK.
There's a Report Generator that can create CSVs from a query, might be useful here.
Part of #6262
Success Criteria
A CSV is attached to this ticket, each row has the ID of the resource, the title, the MMS-ID, the ARK, and the finding aids URL the ARK points at.
First Step
Write a query that gets all resources that have an MMS-ID.
Figure out how to stub webmock so it replicates a redirect.