pulibrary / figgy

Valkyrie-based digital repository backend.
Other
36 stars 4 forks source link

Add report for resources with an alma source metadata id but a finding aids ark target #6356

Closed hackartisan closed 5 months ago

hackartisan commented 5 months ago

advances #6343

hackartisan commented 5 months ago

I just tried this query directly on prod; it returned in about a minute. there were 92,755 resources that had an mmsid.

hackartisan commented 5 months ago

On staging it seems to check the resources like 2 per min, which is gonna take 32 days on prod. Not hitting errors anymore, though -- Maybe it'll run faster on prod? Worth a shot.

hackartisan commented 5 months ago

@escowles It seems to take a really long time to check these ark resolutions. Trey said we can maybe get a dump of ezids, and that you maybe know how to get that?

escowles commented 5 months ago

Yes, I've gotten the most recent report of ARKs from EZID: pul-ezid-2024-04-29.csv.gz

Though the report doesn't include the destination URLs — in the past I've written a shell script to go through and resolve each one in order to get a useful report with the destinations included.

hackartisan commented 5 months ago

Okay thanks @escowles! I'll just have to look into why resolving them is so slow via the ezid client, and/or do it some other way.

escowles commented 5 months ago

FWIW, the script I used the last time I did this just did a HEAD request to resolve them:

#!/bin/sh

# lookup redirect URL for all ARKs in file "arks.txt"

function f
{
  read ARK
  while [ "$ARK" ]; do
    LOC=`curl -s -I https://n2t.net/$ARK | grep Location`
    echo $ARK $LOC
    read ARK
  done
}

cat arks.txt | f
hackartisan commented 5 months ago

Nice! Ha, my first plan was to do HEAD requests, before I got distracted by the Ezid client gem.

tpendragon commented 5 months ago

This is great, good work!