Closed escowles closed 2 years ago
There are 22,208 arks pointing to http://pudl.princeton.edu/* . Each of those web pages must be scraped for metadata that can be used to determine what Figgy object it corresponds with.
@cwulfman Scraping the PUDL pages seems like the most exhaustive way to tackle this — but wouldn't looking in the Figgy database or PUDL METS files be easier (and probably faster)?
I don't think either Figgy or the PUDL METS have the necessary metadata -- I may be wrong.
The object URIs look like http://pudl.princeton.edu/objects/00000017b
. The final segment (00000017b
) is sometimes, but not always, a METS OBJID, so that's probably the right place to start -- you're right. Triage is the name of this game.
Based on meeting btw @escowles and @cwulfman, we will pursue an agile strategy: in the first pass, simply direct all the items in a PUDL collection to its corresponding DPUL collection or finding aid (item -> collection); a second pass can redirect item -> item. This allows us to sunset the PUDL platform more quickly (the top priority).
Begin with GNIB and Latin American Posters, which have the largest number of item-level arks in PUDL.
There are, probably, PUDL collections and/or items that don't map directly to DPUL collections or finding aids. There are, probably, few of these, however.
heya @escowles and @cwulfman what is the FQDN to redirect
/collections/pudl0125 (about 100 items with custom destinations)
/objects/ng451j89v -> arks.princeton.edu/ark:/88435/[ARK id]
/sheetreader.php?obj=6h440w05w -> arks.princeton.edu/ark:/88435/[ARK id]
it is not clear to me.
@kayiwa Both pudl.princeton.edu
and pudltest.princeton.edu
, so:
pudl.princeton.edu/objects/[ARK id]
-> arks.princeton.edu/ark:/88435/[ARK id]
pudl.princeton.edu/sheetreader.php?obj=[ARK id]
-> arks.princeton.edu/ark:/88435/[ARK id]
pudltest.princeton.edu/objects/[ARK id]
-> arks.princeton.edu/ark:/88435/[ARK id]
pudltest.princeton.edu/sheetreader.php?obj=[ARK id]
-> arks.princeton.edu/ark:/88435/[ARK id]
@kayiwa We have updated the spreadsheet with a few patterns and a list of about 60 literal URLs to be redirected: https://docs.google.com/spreadsheets/d/124DRio1JZU45wec3dLDt4z1a1rB-gYcPiXAMOP3IVFA/edit#gid=0
This was done, closing this issue.
Patterns to redirect:
/collections/pudl0125
(about 100 items with custom destinations)/objects/ng451j89v
->arks.princeton.edu/ark:/88435/[ARK id]
/sheetreader.php?obj=6h440w05w
->arks.princeton.edu/ark:/88435/[ARK id]
Hosts to redirect:
pudl.princeton.edu
->dpul.princeton.edu
pudltest.princeton.edu
->dpul.princeton.edu