Closed hackartisan closed 5 months ago
I just tried this query directly on prod; it returned in about a minute. there were 92,755 resources that had an mmsid.
On staging it seems to check the resources like 2 per min, which is gonna take 32 days on prod. Not hitting errors anymore, though -- Maybe it'll run faster on prod? Worth a shot.
@escowles It seems to take a really long time to check these ark resolutions. Trey said we can maybe get a dump of ezids, and that you maybe know how to get that?
Yes, I've gotten the most recent report of ARKs from EZID: pul-ezid-2024-04-29.csv.gz
Though the report doesn't include the destination URLs — in the past I've written a shell script to go through and resolve each one in order to get a useful report with the destinations included.
Okay thanks @escowles! I'll just have to look into why resolving them is so slow via the ezid client, and/or do it some other way.
FWIW, the script I used the last time I did this just did a HEAD request to resolve them:
#!/bin/sh
# lookup redirect URL for all ARKs in file "arks.txt"
function f
{
read ARK
while [ "$ARK" ]; do
LOC=`curl -s -I https://n2t.net/$ARK | grep Location`
echo $ARK $LOC
read ARK
done
}
cat arks.txt | f
Nice! Ha, my first plan was to do HEAD requests, before I got distracted by the Ezid client gem.
This is great, good work!
advances #6343